kanji koohii FORUM
Mighty Morphin Morphology - Printable Version

+- kanji koohii FORUM (http://forum.koohii.com)
+-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html)
+--- Forum: Learning resources (http://forum.koohii.com/forum-9.html)
+--- Thread: Mighty Morphin Morphology (/thread-7486.html)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18


Mighty Morphin Morphology - nest0r - 2011-06-28

Yes, because set theory is so fascinating to just browse through for people who click on a link that sounds like a children's show. ;p

Though I did eventually look at the Wikipedia entries for the Symmetric Differences/Intersection/Union thing. ;p I have to admit there's something sexy about that set stuff.

Do you think you could add more options for unknowns/iPlusN to the GUI db manager? I hesitate to make suggestions, but it could be as simple as adding sentence-level and paragraph-level information to the results in terms of location in the original text, and exporting/importing into Anki.

I think sentence level and paragraph level would be taken care of by using sentence punctuation in conjunction with line breaks. If there are multiple sentences with a single line break, that's presumably a paragraph, no? If you had 2 tallies, this would occasionally result in duplicate numbers for single sentence breaks.

I'm sure you can think of better stuff? Right now I have about 5 ideas of applying your GUI that are floating around, but they're roundabout because I'm imagining duplicating these functionalities through regex and/or importing into Anki just to get specific information before re-exporting.

Edit: Perhaps also more export formatting options? Shrug.

By the way, here's what I have so far, on the small scale:

Input, say, an article. Run it through MorphMan as database A and use known.db as B, then do A-B to get the unknowns. Copy that list and turn into regex and apply to original file:

(Using Ultraedit with Perl regexp engine, this seems to work):

Find: (Word1|Word2|Word3)

Replace: <b></b> (or whatever formatting for visual aid) (that's a backslash and a one, in case it's not showing up properly)

Then you have the original with the unknowns formatted, so you can go through with Rikaisan and create cards with their source sentences, or just get a good impression of unknowns, whathaveyou.

If there's also a regex or somesuch for Finding sentences containing those words and counting the unknowns and adding that tally to the end of a sentence, and doing so for paragraphs...

Otherwise I might end up breaking up all sentences/paragraphs, importing to Anki, running MorphMan to find unknowns/iPlusN and attach the tallies to the ends of the sentence fields, exporting back to text and fixing any formatting errors. From there I could add formatting for clicking from one i+N to another...

There's also adding clozes for unknowns in collocations, but that's neither here no there.

Know what else would be interesting? A Firefox plugin that consults the known.db to darken or highlight knowns or unknowns when you visit any webpage? Perhaps like FoxReplace you could hit a hotkey to trigger this rather than auto-enable on page load.

There's also that stuff that lets you mouseover and select page elements, perhaps that could be used to refine which areas of a page you want to process.

Edit 2: Ooh, you know what? With FoxReplace and also the Vocabulary Highlighter plugin, you can import lists in xml format. So you could turn dbs or lists of unknowns/knowns into xml format and have those load when you're browsing! With FoxReplace I bet you could really get flexibility with the visual formatting, though I think Vocab Highlighter has various options also. But if you also have interval levels, you can have different shades... Okay that might be overkill. ;p


Mighty Morphin Morphology - Boy.pockets - 2011-06-29

@overture
bug report for mac:

Code:
An error occurred in a plugin. Please contact the plugin author.
Please do not file a bug report with Anki.

Traceback (most recent call last):
  File "/Users/YounKnowWho/Library/Application Support/Anki/plugins/morph/manager.py", line 101, in onExtractTxtFile
    ms = M.file2ms( srcPath )
  File "/Users/YounKnowWho/Library/Application Support/Anki/plugins/morph/morphemes.py", line 102, in file2ms
    inp = unicode( open( path, 'r' ).read(), 'utf-8' )
IOError: [Errno 2] No such file or directory: PyQt4.QtCore.QString(u'/Users/YouKnowWho/Documents/Japanese/(\u4e00\u822c\u5c0f\u8aac) [J\u30fbK\u30fb\u30ed\u30fc\u30ea\u30f3\u30af\u3099] \u30cf\u30ea\u30fc\u30fb\u30db\u309a\u30c3\u30bf\u30fc\u30b7\u30ea\u30fc\u30b9\u3099 01-07 (\u66ab\u5b9a\u7248) (\u9752\u7a7a\u6587\u5eab\u5bfe\u5fdctxt)/test 2.txt')
This happened when I tried to "export morphemes from file" with Japanese in the source file path. The error message should read "/Users/YouKnowWho/Documents/Japanese/(日本語) [J・K・日本語] 日本語・日本語 01-07 (日本語) (日本語txt)"

This is not an show-stopper - I simply move the file to another directory; just thought I would let you know. Happy to do more tests if needed.


Mighty Morphin Morphology - nest0r - 2011-06-29

Looks like it doesn't like Japanese folder/file names? Happens with a lot of programs/plugins, I think. Or is it okay with Japanese file names but not folders? That could be good to know.


Mighty Morphin Morphology - nest0r - 2011-06-29

@overture

I believe I have discovered that at 4051 characters MorphMan instantly processes a text, but at 4052 it will cause Anki to hang.


Mighty Morphin Morphology - vosmiura - 2011-06-29

I think I ran into the same issue. I think it's a problem with Mecab. I got around it by changing the import to send one line at a time to MeCab instead of the whole file in one go.


Mighty Morphin Morphology - nest0r - 2011-06-29

Does that affect the speed? How'd you do it? ;p

I noticed with this: http://www.hyuki.com/trans/prince.html - It's 4066 (the character cutoff point where it goes from instant success to indefinite hang). - 610 morphemes.

For: http://www.hyuki.com/trans/magi.html - It's 4051. - 654 morphemes.


Mighty Morphin Morphology - nest0r - 2011-06-29

Hmm, something to think about while I work on using regexp to count unknowns per sentence via the GUI and whatnot... for smaller articles and microblog entries and paragraphs and stuff, I wonder if I should try putting those entire swaths of text as individual cards and sort by iPlusN/set unknowns, a MorphMan-augmented Japanese version of the cards I do for specific non-Japanese subjects, where I just grab passages from certain texts I'm reading and paste them into empty cards, and read them and don't really grade them unless they seem very unfamiliar; I used to throw in arbitrary clozes just to add a layer of engagement that also developed negative capability (rather than for memorization, more like working on inferencing gist, with feedback), and I bet I could also tinker with refining those based on a paper I read and MorphMan...

Meanwhile another use of MorphMan I've been thinking of that Katsuo might like, is to import n-gram lists from AntConc, i.e. lists of collocations from novels or articles or whathaveyou, and generate clozes for the unknowns...

Could be interesting to have something like rss2srs. ;p I tried to duplicate a similar functionality before with a site shuffler (http://forum.koohii.com/showthread.php?tid=5267), like you could visit random bookmarks but specify domains and such (or a range of URLs); perhaps even give it a kind of spacing schedule (the more one-sided your cards, i.e. when it's not retrieval, the more the spacing rather than testing effect becomes important, in terms of accomplishing desirable difficulty).

Hmm, I wonder if I might do something with Transcriber, Text-to-Speech, and any text, to customize and automate the process of creating decks from .trs files with subs2srs... I bet it wouldn't be hard to use a find/replace to turn all paragraphs in a text into card entries for an Anki import text, either.


Mighty Morphin Morphology - overture2112 - 2011-06-29

nest0r Wrote:Looks like it doesn't like Japanese folder/file names? Happens with a lot of programs/plugins, I think. Or is it okay with Japanese file names but not folders? That could be good to know.
Interesting Boy.pockets. I'll try playing around with the path code to see what's going on.


Mighty Morphin Morphology - vosmiura - 2011-06-29

nest0r Wrote:Does that affect the speed? How'd you do it? ;p
It runs very quickly. Check the "- mod" plugin, I think it has that fix Wink.


Mighty Morphin Morphology - nest0r - 2011-06-29

vosmiura Wrote:
nest0r Wrote:Does that affect the speed? How'd you do it? ;p
It runs very quickly. Check the "- mod" plugin, I think it has that fix Wink.
Thanks! It works a treat. I just modded the morphemes.py of the original plugin to include a bit of what you wrote, replacing the original def file2ms section with: http://pastebin.com/QG7W0bfA


Mighty Morphin Morphology - nest0r - 2011-06-30

Not really the thread for it, but since I posted the above about different types of cards, this looks interesting, for other possibilities of integrating Anki with online articles and possibly feeds: http://www.frankraiser.de/drupal/AnkiIR (incremental reading plugin for Anki)


Mighty Morphin Morphology - MacMiller - 2011-08-09

Sorry for bumping this thread--
I've used this plugin for about 4 or 5 hours and it's been incredible...
Lol 国境付近で小競り合いが絶えねぇし 北は北で 大国ドラクマが控えてる 一応不可侵条約を結んでるけど (FMA Brotherhood episode 35)
I don't know if that's a hard sentence or not, but for me it has a rating of i+15.
I probably would have tried to learn the sentence if it weren't for that rating.

I love this forum. Big GrinDD
Thank you all so much for your input.


Mighty Morphin Morphology - nest0r - 2011-08-21

Added this for now. Tentatively.

http://rtkwiki.koohii.com/wiki/Anki_tools_for_Japanese_study#Morpheme_Databases


Mighty Morphin Morphology - overture2112 - 2011-08-21

nest0r Wrote:Added this for now. Tentatively.

http://rtkwiki.koohii.com/wiki/Anki_tools_for_Japanese_study#Morpheme_Databases
Fancy. Perhaps a less silly name is in order for the next version. Suggestions?


Mighty Morphin Morphology - Boy.pockets - 2011-08-21

overture2112 Wrote:
nest0r Wrote:Added this for now. Tentatively.

http://rtkwiki.koohii.com/wiki/Anki_tools_for_Japanese_study#Morpheme_Databases
Fancy. Perhaps a less silly name is in order for the next version. Suggestions?
I like Nest0r's "Morph Man" and my "Morphin", but as it is a plug-in it should probably also be a bit descriptive of what it does.


Mighty Morphin Morphology - overture2112 - 2011-08-21

Boy.pockets Wrote:I like Nest0r's "Morph Man" and my "Morphin", but as it is a plug-in it should probably also be a bit descriptive of what it does.
Yes, sadly the current name doesn't allude to it's primary use as a means to rank facts in your deck according to ease.


Mighty Morphin Morphology - nest0r - 2011-08-21

I don't know, something about words and knowledge. It'll always be MorphMan to me. ;p


Mighty Morphin Morphology - nest0r - 2011-08-22

Okay, added something good: http://rtkwiki.koohii.com/wiki/Anki_tools_for_Japanese_study#Morpheme_Databases

Oops. Added a new archive with new link. Guess you can't actually save as single column dbs.


Mighty Morphin Morphology - overture2112 - 2011-08-24

nest0r Wrote:Okay, added something good: http://rtkwiki.koohii.com/wiki/Anki_tools_for_Japanese_study#Morpheme_Databases.
Neat, in return I'll add something good too:

Morph Man 2 has been released!

Good news:

It automatically makes DBs for all your decks, including filtered versions which only contain morphemes which have a corresponding card of at least some level of maturity.

It also automatically updates i+N, unknowns, vocab rank, morph man index and sorts your new card order (via card creation time manipulation) for you. Constantly, in the background, across all your decks.

Bad news:

The database format has changed and nest0r's work above is slightly less awesome.

--

Bug reports, better default configs, feature suggestions, etc are welcome.

EDIT: nest0r, what all else do you want done with the card maturity information?


Mighty Morphin Morphology - nest0r - 2011-08-24

Whoa. Awesome. Do you uh, think you could post more instructions/explanations/etc.? So much stuff. What is the intervals thing? You said something about new cards. Can I tell it not to mess with new cards? What about the threshold stuff besides mature threshold?

What format's the .db stuff in? Seems hard to manipulate the word lists now, using them to format stuff outside MorphMan (for example, to use the list to format the knowns/unknowns for reading).

Umm, what else. Uh, so, how can I turn this into a card maturity reminder plugin thingy?

Or perhaps there's a way to add a functionality where when all cards with a certain tag have been merged with the known.db per maturity threshold, it tells you, or tags cards whose Expression fields are entirely known, or cards that are mature?

Also, did you include that bit from the mod that lets you extract morphemes from larger files without MorphMan dying?

Also, that dropdown list, how do you add stuff/remove stuff, or if it's automatic, how's it activated/how's it choose what decks to add?


Mighty Morphin Morphology - overture2112 - 2011-08-24

nest0r Wrote:Whoa. Awesome. Do you uh, think you could post more instructions/explanations/etc.? So much stuff.
There's a started attempt in morph/readme.txt, which I'll start actively improving.

nest0r Wrote:What is the intervals thing?
It keeps track of where it found every morpheme. For example, if it's from an anki deck it knows the fact id and field, the maturities of every card for that fact, and the highest maturity.

So the interval 5 db contains all morphemes in facts who have a card of at least 5 days. Note, Anki considers 21 days mature.

Note: currently if you import a text file it just saves the name of the file, but it could be expanded to store byte index, line number, paragraph number, etc if you can decide how to determine those.

nest0r Wrote:You said something about new cards. Can I tell it not to mess with new cards?
Did I? And no. It only modifies the fields you specify (configurable in the gui and default to the names morph man 1.0 used) and card creation timestamps, so hopefully that's not a problem?


nest0r Wrote:What about the threshold stuff besides mature threshold?
Currently it works off of the "known threshold" for deciding what get merged into known.db. The mature and learnt threshold don't do anything at the moment but I was brainstorming ideas (like perhaps everything not mature goes in the unknowns field but i+N is done based on knowns?).


nest0r Wrote:What format's the .db stuff in? Seems hard to manipulate the word lists now, using them to format stuff outside MorphMan (for example, to use the list to format the knowns/unknowns for reading).
It is a compressed (gzip) serialized python object (morphemes.MorphDb via pickle). You can load a db in the morph man gui and copy/paste from there, or I can make it also save a version of the db with less info (like just the morpheme base form / reading) if you think that'd be useful


nest0r Wrote:Umm, what else. Uh, so, how can I turn this into a card maturity reminder plugin thingy?

Or perhaps there's a way to add a functionality where when all cards with a certain tag have been merged with the known.db per maturity threshold, it tells you, or tags cards whose Expression fields are entirely known, or cards that are mature?
If the expression field for the fact is entire known the i+N field will be 0.

I could make it so that if a card is mature the fact gets tagged in some way, but I don't think you can tag cards themselves in anki.

nest0r Wrote:Also, did you include that bit from the mod that lets you extract morphemes from larger files without MorphMan dying?
I believe so. I also took the vocab ranking algorithm and use base forms

nest0r Wrote:Also, that dropdown list, how do you add stuff/remove stuff, or if it's automatic, how'd it activated/how's it choose what decks to add?
The plugin uses Anki's recently opened decks list to decide what to try to scan. The dropdown box lists all decks that have a config file made for them (in anki_plugin_path/morph/decks/deck_name_here/config).

Note, they won't all be there until it finishes running (which may take a few minutes). The gui will report the last time a deck and it's dbs were updated and the last time a full sync was completed or if it hasn't yet completed one since starting Anki.


Mighty Morphin Morphology - nest0r - 2011-08-24

overture2112 Wrote:EDIT: nest0r, what all else do you want done with the card maturity information?
Hmmm. My brain is spinning.

Here are some ideas, and you tell me how you think they'd best be achieved with MorphMan, because I feel like maybe we're already there and I just haven't grasped the functions/workflow possibilities:

You're learning RTK kanji. Rather than wait until you complete RTK Lite or RTK1, you'd like to start learning words, or switch to Japanese keywords, or start doing sentences, but only with kanji you know, when particular kanji cards or a set of cards (by Heisig lessons?) are mature.

You're learning some words. They're recognition cards. You're constantly adding new words as recognition cards. When these recognition cards are mature, you'd like to do them as production cards or cloze cards, etc., or use them outside Anki, perhaps in writing tasks, or use the list of these words in some other program.

Then there's the subs2srs vocab/video clip stuff I do, but the dynamic I think applies for anyone who wants to follow that dynamic of taking sentence cards, first noting the unknowns and doing those unknowns as their own cards, then going back to the sentence cards: For me, per the outline I mentioned in the other thread, I think at this point it's just a matter of unsuspending the video clips cards of the same fact when the vocab card of that fact becomes mature.

Edit: Wrote this before reading your above comment, so I might need to edit stuff.

Edit 2: Here's what you said about new cards: “sorts your new card order (via card creation time manipulation) for you” - How's that work? Guess I should read the readme. ;p I assume it doesn't mess with Study Options settings, like Review New in Random Order and SelectiveStudy stuff?

Edit 3: When you talk about the expandable information such as line number and stuff, I take it you're referring to being able to do novels and articles sentence by sentence or paragraph by paragraph? That certainly would be nice. What about finding and possibly sorting by the unknowns per sentence/paragraph in any given text via punctuation like 。?

Edit 4: Oh, right, listing in the GUI then copying, that works, re: the .db format. (Mentally deprioritizing this now. ;p) If nothing else, re: maturity reminders, just... something to remove the task of having to constantly remind yourself and having to manually check by sorting by interval or whathaveyou whether cards are mature would be great. Even though I'm always harping on about this and no one says anything, I swear when you get that happening people will find a ton of uses for it.

Anyway thanks for making this tool, and thanks for reading/considering my rambling proposals regarding it. ;p


Mighty Morphin Morphology - overture2112 - 2011-08-24

nest0r Wrote:You're learning RTK kanji. Rather than wait until you complete RTK Lite or RTK1, you'd like to start learning words, or switch to Japanese keywords, or start doing sentences, but only with kanji you know, when particular kanji cards or a set of cards (by Heisig lessons?) are mature.

You're learning some words. They're recognition cards. You're constantly adding new words as recognition cards. When these recognition cards are mature, you'd like to do them as production cards or cloze cards, etc., or use them outside Anki, perhaps in writing tasks, or use the list of these words in some other program.

Then there's the subs2srs vocab/video clip stuff I do, but the dynamic I think applies for anyone who wants to follow that dynamic of taking sentence cards, first noting the unknowns and doing those unknowns as their own cards, then going back to the sentence cards: For me, per the outline I mentioned in the other thread, I think at this point it's just a matter of unsuspending the video clips cards of the same fact when the vocab card of that fact becomes mature.
I'm not sure how to tackle the problem generally. Perhaps for the last case you can write something like

Code:
if fact is mature:
  readyForProduction = False
  for card belonging to fact:
    if card has tag "recognition" and card is mature:
        readyForProduction = True
    if card has tag "production":
        unsuspend card
where the tags for the 2 types of cards are customizable.

nest0r Wrote:Edit 2: Here's what you said about new cards: “sorts your new card order (via card creation time manipulation) for you” - How's that work? Guess I should read the readme. ;p I assume it doesn't mess with Study Options settings, like Review New in Random Order and SelectiveStudy stuff?
The study option 'show new cards in order added' or whatever shows new cards according to when the card was created. So this just changes the time stamp on each card to make it enforce the ordering according to morph man index.

nest0r Wrote:Edit 3: When you talk about the expandable information such as line number and stuff, I take it you're referring to being able to do novels and articles sentence by sentence or paragraph by paragraph? That certainly would be nice. What about finding and possibly sorting by the unknowns per sentence/paragraph in any given text via punctuation like 。?
Yea, I figured maybe it'd be useful if you import directly from a webpage, a novel, etc and we want to do analysis on it later on.


Mighty Morphin Morphology - nest0r - 2011-08-24

That would be cool if you can get that in plugin format (re: the code).

Even just a reminder that cards have become mature (or even known if you don't want to mess with mature threshold, or both?), like a little popup log that has a list of just-turned mature cards and/or their tags if all the cards of a tag are mature, and then you can do with that information what you will, e.g. unsuspend cards of the same facts with different layouts, export them, edit the layout, suspend them, whathaveyou.


Mighty Morphin Morphology - overture2112 - 2011-08-24

nest0r Wrote:like a little popup log that has a list of just-turned mature cards
How would you identify a card? There's the internal id number (useless to you) or you could have it tell you the value of some field of the fact it belongs to and the card type (slightly more useful?).