![]() |
|
Mighty Morphin Morphology - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: Learning resources (http://forum.koohii.com/forum-9.html) +--- Thread: Mighty Morphin Morphology (/thread-7486.html) |
Mighty Morphin Morphology - overture2112 - 2012-03-20 As Anki 2 is getting closer to a release, I'm going to start writing Morph Man 3 as the new database/deck design necessitates an update. That said, I'm also looking to add new features or make improvements while I'm at it. Aside from some obvious errors/limitations to correct (like some inconsistencies where a few paths lacked unicode support) some things that I personally ran into were: 1) When learning new i+1 sentences, I'd like to prioritize learning the most common morphemes I don't know. Currently the Morph Man Index prioritizes number of unknowns, then length of sentences, then vocab rank- so this would mean either sorting by morpheme popularity instead of vocab rank or coming up with some formula that takes both into account a little bit. 2) When learning new i+1 sentences, I'd like to avoid seeing a sentence where the "1" is a morpheme I just learned since I started this batch of new cards. For example, with a new show it's not uncommon to see a handful of sentences with a person's name in a row. Of course, if I close the deck and let MorphMan update it'll determine those "duplicates" are i+0 now, it's just that MM needs to skip new cards with morphemes already seen since the last full deck update. 3) A nice way to retire sentences whose morphemes are now covered by another sentence you've learned. For example, I no longer have need for the sentences I used to learn the components that comprise 問おう、貴方が私のマスターか, especially since this one is hundreds of times more memorable. You could even apply the set covering problem to find the minimum subset of your sentences that keeps all the morphemes you know, hopefully allowing you to purge your deck of now "useless"/trivial sentences (perhaps only apply this to mature cards). While it doesn't take a great deal of time to main cards once they've become mature, that time does certainly add up when you have thousands. 4) Morph Man Index based sort globally across all decks. In Morph Man 2 cards were sorted by MMI and it was easy to study all the new i+1 sentences for a deck, but you couldn't study new i+1 sentences across all decks (and in optimal order) easily. Also if you wanted to switch to a different deck, you first had to wait for it to update the i+N information for the second deck after seeing the changes to the first deck, which was an annoying delay. With Anki 2's new deck / subdeck system and new "new card" sorting mechanism, you could just study a deck and all it's subdecks at once and with optimal ordering. This one seems pretty trivial to implement now. Mighty Morphin Morphology - syntoad - 2012-03-22 Since there is a rewrite in the works I would like to throw in a feature request as well. I would love to get a list of unknown morphemes sorted by frequency. It could help optimize learning further by knowing which words I need to learn to make the most cards become i+1. Mighty Morphin Morphology - overture2112 - 2012-03-22 syntoad Wrote:I would love to get a list of unknown morphemes sorted by frequency. It could help optimize learning further by knowing which words I need to learn to make the most cards become i+1.Ah, I definitely wanted to use that information to help improve the morph man index (and thus new card order) so I'm always learning the most useful words as per (1) above but I didn't consider exposing it as well. Being able to create a word frequency list from any morph man db (and thus combination of decks) would be pretty neat too, in addition to just the unknowns frequency list. Then we could finally stop using those silly premade word frequency lists based on newspapers and business articles when doing analysis. Btw, did you have any particular ideas as to leveraging such a list of unknowns? Mighty Morphin Morphology - qwarten - 2012-03-24 In relation to the frequency "awareness", I'd like to know how possible or useful this might be. Suppose that I started using morph man later in my studies, and say I have read about 10 light novels. Also suppose that my vocab deck is not very large (~3000 facts) but I know more, possibly twice that amount. Now when I use the morph man there are a lot of verbs and vocabulary there that I may know in several forms but listed in one form in my deck. This would add quite a lot of clutter to sift through even with a frequency sorted list. So, perhaps a function to recognize frequently occurred items/words in a previously read book or books which we are confident with as known items/words would be nice. This would enable me to look for really unknown words in a new book. Of course frequency level (eg occurred more than 20 times, 10 times etc) should be definable. The above I suspect could be done with an independent script but I don't know much about programming so it is beyond my abilities at this time. Mighty Morphin Morphology - overture2112 - 2012-03-24 qwarten Wrote:So, perhaps a function to recognize frequently occurred items/words in a previously read book or books which we are confident with as known items/words would be nice.So you use the text file importer to get morpheme DBs for the light novels, combine them all into one DB of "morphemes I've seen during my readings", filter based on frequency to make a DB of "morphemes I've seen during my readings at least N times, thus are most likely known to me", then you can merge that into your known.db? That seems easy enough and fairly useful. Mighty Morphin Morphology - gwedig - 2012-03-24 If we're talking features, some things I've often wanted. For a given morpheme (or all the morphemes on a card) I'd like to see a breakdown of how many cards each of Known, Mature and Total. I find that with a lot of morphemes, until I have it on multiple cards, I don't *really* learn it (I can't recognize it in other contexts), and this would help find those gaps. It would also be nice to see the highest interval for the morpheme in this table. It would also be nice to find all the cards for a given morpheme. This could be easily done if you just added a field with *all* the morphemes on the card in dictionary form separated out. This would allow searching for verbs, adjectives, kana forms that often come up in other words or as particles, etc. I would use this to find additional cards with low i+n that could be added cheaply to increase the number of times difficult morphemes are shown. Mighty Morphin Morphology - overture2112 - 2012-03-24 gwedig Wrote:For a given morpheme (or all the morphemes on a card) I'd like to see a breakdown of how many cards each of Known, Mature and Total...It would also be nice to see the highest interval for the morpheme in this table.Right now Morph Man 2 thinks only of facts + the highest maturity amongst all cards for that fact on the theory that cards sharing the same fact (sibling cards) aren't worth making distinction for. I was considering making it based on cards instead of facts, but I haven't fully thought out the implications yet (eg, do you count a morpheme in 3 facts with 4 cards each as 12 occurrences or 3 as before?). gwedig Wrote:It would also be nice to find all the cards for a given morpheme....field with *all* the morphemes on the card in dictionary form...This would allow searching for verbs, adjectives, kana forms that often come up in other words or as particles, etc.Good idea. Mighty Morphin Morphology - overture2112 - 2012-03-24 overture2112 Wrote:1) When learning new i+1 sentences, I'd like to prioritize learning the most common morphemes I don't know.So I'm trying to come up with a new formula for the Morph Man Index but I'm unsure how to prioritize frequency of a morphemes vs length of sentence or number of morphemes in the sentence (since you want to avoid very short sentences that teach a rare word or very long sentences that teach a common word). As such, I'm taking suggestions on ideas for a new formula. Previously the formula was: Code: mmi = 10000*N_k + 1000*N_morphs + vr6135 unique morphemes with 60984 total occurrences '9 occur >=1000 times for 11485/60984 = 18.8%' '89 occur >=100 times for 32590/60984 = 53.4%' '169 occur >=50 times for 38301/60984 = 62.8%' '724 occur >=10 times for 49519/60984 = 81.2%' '1314 occur >=5 times for 53353/60984 = 87.5%' '3072 occur >=2 times for 57921/60984 = 95.0%' '9 occur >=1000 and <=5000 times for 11485/60984 = 18.8%' '81 occur >=100 and <=1000 times for 22105/60984 = 36.2%' '638 occur >=10 and <=100 times for 17229/60984 = 28.3%' '2402 occur >=2 and <=10 times for 8942/60984 = 14.7%' '3063 occur >=1 and <=1 times for 3063/60984 = 5.0%' Mighty Morphin Morphology - syntoad - 2012-03-25 overture2112 Wrote:Btw, did you have any particular ideas as to leveraging such a list of unknowns?Actually I think the best thing would be the ability to quickly mark a word as known. For example let's say I already know the word 最後 so I never bothered making a card of it. But this word happens to be holding back about 20 cards from being +1. If, with a few clicks, this word could become known, it could make the whole learning process a lot faster. Or in the case that it is a word I didn't know, It would be trivial to then look up that word and find a few good sentences to add, making it naturally become mature as I learn it, and then unlocking my other sentences. Mighty Morphin Morphology - Korvar - 2012-03-27 Following the instructions at: http://ankisrs.net/docs/JapaneseSupport.html#Ubuntu ...fixed my problem. Specifically, I did: Code: sudo apt-get remove mecab-ipadic mecab kakasiSo now I have it working, a question: Does it maintain separate DBs for each deck, or are they all combined? Mighty Morphin Morphology - overture2112 - 2012-03-27 Korvar Wrote:So now I have it working, a question: Does it maintain separate DBs for each deck, or are they all combined?Both, sort of. Morph Man 2 creates a directory for each deck which contains an all.db (every morpheme in the deck) and interval.N.db which stores all morphemes with intervals of at least N (so interval.21.db is all your "matured" morphemes in that deck). Additionally, there is a global known.db and mature.db which store all your known and mature morphemes across all your decks. You can also manually merge other databases into these to inform MorphMan of knowledge outside Anki (eg, take a novel or just a random text file full of words you know, use the text file import feature to make a DB, then merge that DB with known/mature.db by via the union mechanism). I'm still playing around with the format for Morph Man 3 as Anki 2 isn't designed around decks anymore and some significant speed improvements allow more interesting designs. Mighty Morphin Morphology - gombost - 2012-03-29 Korvar Wrote:Following the instructions at: http://ankisrs.net/docs/JapaneseSupport.html#UbuntuMy Japanese support plug-in was installed incorrectly too but the Morph Man plug-in still doesn't work after reinstallation, although this time Anki's error message is different: Traceback (most recent call last): File "/home/tamas/.anki/plugins/morph/util.py", line 97, in ed.connect( a, SIGNAL('triggered()'), lambda e=ed: doOnSelection( e, overviewMsg, progMsg, preF, perF, postF ) ) File "/home/tamas/.anki/plugins/morph/util.py", line 82, in doOnSelection st = perF( st, f ) File "/home/tamas/.anki/plugins/morph/exportMorphemes.py", line 25, in per ms = M.getMorphemes( st['mp'], f[ fname ] ) File "/usr/lib/python2.7/site-packages/anki/facts.py", line 89, in __getitem__ raise KeyError(key) KeyError: PyQt4.QtCore.QString(u'Expression') python2 version: 2.7.2 mecab version 0.993 mecab charset: euc-jp The package mecab-jumandic is not installed. Sorry for bothering you when you are rewriting the plug-in for version 3. Maybe I should just wait for it in Anki 2. Mighty Morphin Morphology - overture2112 - 2012-03-29 gombost Wrote:File "/home/tamas/.anki/plugins/morph/exportMorphemes.py", line 25, in perAt least one of the facts you're trying to export morphemes from doesn't have a field called "Expression". Mighty Morphin Morphology - gombost - 2012-03-29 overture2112 Wrote:At least one of the facts you're trying to export morphemes from doesn't have a field called "Expression".Will it work if I rename the field called "Reading" which contains hiragana between brackets after each kanji or kanji compound? EDIT: OK, I get it. In the popup dialog I have to type the name of the field that the plug-in should examine. Mighty Morphin Morphology - Korvar - 2012-04-05 I thought it was possible to change the field checked? So it doesn't have to be "Expression"? Or is that hard coded? Edit: Hm. Okay, only one of my decks has the "Japanese" model, with the fields MorphMan appears to expect... okay. Let's work with that for a bit, and ignore the others for now Mighty Morphin Morphology - vix86 - 2012-04-07 I don't know how strong your Comp Sci. background is Overture, but have you looked into any algorithms for finding optimal learning paths in a set of facts? Such that given a set of known morphemes and set of unknown morphemes you start at some point in the unknown set and build a progressive index through the set. That way you can export the cards, sort the list and reimport with the optimal route. My first question was mostly because this question/feature is really an NP-hard problem; classic traveling salesman type issue. I believe there are some decent algorithms for solving traveling salesman type problems (I believe); how applicable they are to something so conditional as morpheme learning I'm not sure. I suppose you could take i+1, enumerate them and treat them as "hypothetically known" and then run again and do the same thing with those that now show up as i+1, repeat till there are no more cards. This is assuming you'll constantly be ending up with i+1 cards though which might not be the case. Mighty Morphin Morphology - overture2112 - 2012-04-07 vix86 Wrote:I don't know how strong your Comp Sci. background is Overture, but have you looked into any algorithms for finding optimal learning paths in a set of facts? Such that given a set of known morphemes and set of unknown morphemes you start at some point in the unknown set and build a progressive index through the set. That way you can export the cards, sort the list and reimport with the optimal route.If you're only learning i+1 sentences then the distance between any two valid sentences is 1, and so any complete path is an optimal path. That is, MorphMan's ordering will have you learn all N new morphemes in your deck in the minimal number of sentences (ie, N) so long as you have enough sentences to guarantee there is always an i+1 sentence while there are more morphemes to learn (see below). vix86 Wrote:I suppose you could take i+1, enumerate them and treat them as "hypothetically known" and then run again and do the same thing with those that now show up as i+1, repeat till there are no more cards. This is assuming you'll constantly be ending up with i+1 cards though which might not be the case.I've considered doing this to find if you'll ever run out of i+1 sentences yet still have some unlearned facts remaining, in order to know whether you need to add more to your deck(s) to prevent having to use i+2 or worse sentences. In the end I decided to just make more subs2srs decks instead. Anyway, assuming we have enough i+1 cards, we'll always learn the minimal number of sentences to learn all the morphemes in our deck, so the more useful issues to optimize are: 1) Difficulty of learning a sentence. Usually you'll get multiple i+1 sentences that all teach the same new word, but which one(s) do you choose to use and which do you just suspend? The new MorphMan factors in sentence length more intelligently (eg, avoid sentences which are too long or too short) and I'm considering porting over the vocab rank stuff (ie, whether you already know the readings for the kanji in a morpheme) but it never worked great and I'm not happy with any formula I've come up with that utilize it. 2) Short term learning. While I'd eventually like to learn all the morphemes in my deck, which are the next 500 I should focus on that would be most useful to me? The new MorphMan sorts i+1 sentences by how common the "focus" morpheme is within your all.db (thus anki collect + other sources you've exported- like novels and articles). That's a good start, but it should probably factor in part of speech as well given how much more important verbs seem to be. Mighty Morphin Morphology - vix86 - 2012-04-07 overture2112 Wrote:I've considered doing this to find if you'll ever run out of i+1 sentences yet still have some unlearned facts remaining, in order to know whether you need to add more to your deck(s) to prevent having to use i+2 or worse sentences. In the end I decided to just make more subs2srs decks instead.I suppose its really only an issue when you start out and your vocabulary is small so you'll have very few i+1's and may end up with large number of i+2 and up. In that case you'd either find more i+1's or you struggle on i+2s. If you struggled on i+2's then you'd probably want to pick the sentences which would lead to progressively more i+1 sentences in a kind of avalanche style. Finding the i+2 which would create the largest "avalanche" of continual i+1s (since those new sentences would give rise to more i+1 sentences) would be difficult I think. Real world solution would be to just do some of the Core decks and it'd give you a good padding of known morphemes. #2 sounds like a great idea. More bang for your buck/time, kind of deal. Mighty Morphin Morphology - overture2112 - 2012-04-07 vix86 Wrote:I suppose its really only an issue when you start out and your vocabulary is small so you'll have very few i+1's and may end up with large number of i+2 and up.I found this wasn't a problem after I added a K-On! season 1 subs2srs deck. Adding a single season of a show drastically increases your i+1 selection; even in my tests with an empty known.db and reset decks you could do at least a few hundred (didn't check beyond). Mighty Morphin Morphology - Daichi - 2012-06-16 overture2112 Wrote:I've considered doing this to find if you'll ever run out of i+1 sentences yet still have some unlearned facts remaining, in order to know whether you need to add more to your deck(s) to prevent having to use i+2 or worse sentences. In the end I decided to just make more subs2srs decks instead.I'll just point out I've seen morphemes sometimes show up only in pairs. Meaning, these will always be i+2 if I have no other cases of these morphemes. Sometimes these are just simple but very specific words, like a full name only said by itself, so adding more sentences would never solve the problem. So maybe a little bit of pair checking can be done to, count these higher priority then normal i+2 sentences. (Also, I recall in at least the original Morph Man, the same word several times would also count as a separate unknown for each time it shows up in a sentence.) So Anki 2 is now starting to look very mature, how is your progress so far on Morph Man 3? Mighty Morphin Morphology - overture2112 - 2012-06-16 Daichi Wrote:That's an interesting point wrt pairs; I'll have to brainstorm how to best handle those but hopefully Morph Man 3's use of morpheme popularity would at least rank them highly amongst the i+2s.overture2112 Wrote:I've considered doing this to find if you'll ever run out of i+1 sentences yet still have some unlearned facts remaining, in order to know whether you need to add more to your deck(s) to prevent having to use i+2 or worse sentences. In the end I decided to just make more subs2srs decks instead.I'll just point out I've seen morphemes sometimes show up only in pairs. Meaning, these will always be i+2 if I have no other cases of these morphemes. Sometimes these are just simple but very specific words, like a full name only said by itself, so adding more sentences would never solve the problem. So maybe a little bit of pair checking can be done to, count these higher priority then normal i+2 sentences. (Also, I recall in at least the original Morph Man, the same word several times would also count as a separate unknown for each time it shows up in a sentence.) Morph Man 3 has the basics completed and just needs polish. I should be able to make an initial release soon now that i'm back in the US. Mighty Morphin Morphology - MonjaIsshin - 2012-09-13 Maybe this has been explained somewhere and I am not a programmer, so please bear with me... . I am trying to use Morph Man 2 (which looks like a fantastic plug-in!) for the first time and keep getting an error message when I open Anki. I believe I have successfuly created the known.db and I have added the iPlusN field. The various folders (RTK, Tanuki-Ultima, etc) have config files, the Japanese Mimetic Expressions folder does not. I tried copying one of the other config files to that folder but continue to get the error message, so I guess the plug-in didn't like that config file... . Also, when I open a deck, select some cards and go to Actions-It's Morphin Time, the option to set i+N does not show up. Here's the error message: Exception in thread Thread-1: Traceback (most recent call last): File "C:\cygwin\home\dae\Home\anki\win\build\pyi.win32\anki\outPYZ1.pyz/threading", line 530, in __bootstrap_inner File "C:\Users\Monja\AppData\Roaming\.anki\plugins\morph\auto.py", line 494, in run run() File "C:\Users\Monja\AppData\Roaming\.anki\plugins\morph\auto.py", line 449, in run dm = DeckMgr( deck ) File "C:\Users\Monja\AppData\Roaming\.anki\plugins\morph\auto.py", line 82, in __init__ self.loadCfg() File "C:\Users\Monja\AppData\Roaming\.anki\plugins\morph\auto.py", line 103, in loadCfg self.saveCfg() File "C:\Users\Monja\AppData\Roaming\.anki\plugins\morph\auto.py", line 108, in saveCfg f = gzip.open( self.cfgPath, 'wb' ) File "C:\cygwin\home\dae\Home\anki\win\build\pyi.win32\anki\outPYZ1.pyz/gzip", line 34, in open File "C:\cygwin\home\dae\Home\anki\win\build\pyi.win32\anki\outPYZ1.pyz/gzip", line 89, in __init__ IOError: [Errno 2] No such file or directory: 'C:\\Users\\Monja\\AppData\\Roaming\\.anki\\plugins\\morph\\dbs\\deck\\Japanese Mimetic Expressions \\config' What am I doing wrong - or what is missing on my computer? Thank you! Mighty Morphin Morphology - overture2112 - 2012-09-13 MonjaIsshin Wrote:dm = DeckMgr( deck )Is there supposed to be a space after the name of your deck? It seems to think it's "Japanese Mimetic Expressions " and not "Japanese Mimetic Expressions", which I'm guessing may be the issue. Mighty Morphin Morphology - MonjaIsshin - 2012-09-14 Thank you for your reply, overture. The space was actually in the file name the way it installed when I downloaded it from shared files. Apparently, Morph Man doesn't like file names that end with a space. So I renamed the file in question. But, even so, it didn't seem to recongnize that the name had been changed and kept giving the same error message. So I tried deactivating the plug-in and after restarting Anki (which I have been closing and restarting just about every step of the way), downloaded Morph Man again, to see if that would work. Well, it sort of worked - it didn't complain about THAT file anymore, but found another file name ending with a space (which is the way it installed at download). Even though I have renamed the file, and checked just about everything I can to avoid this happening again, NOTHING I have been able to do has gotten the plug-in out of this new error message. Where on earth does it store its instructions, so that it just keeps looking for the same stuff???? I am not a programer and do not understand these things, despite being a skilled user. I'm just losing too much time hassling with this. And I still haven't seen the option to "set iPlusN"... . HELP!!!! Here's the new error message: Houve um erro com um plugin. Por favor contacte o autor dele. Por favor não reporte o bug com o Anki. Exception in thread Thread-1: Traceback (most recent call last): File "C:\cygwin\home\dae\Home\anki\win\build\pyi.win32\anki\outPYZ1.pyz/threading", line 530, in __bootstrap_inner File "C:\Users\Monja\AppData\Roaming\.anki\plugins\morph\auto.py", line 494, in run run() File "C:\Users\Monja\AppData\Roaming\.anki\plugins\morph\auto.py", line 449, in run dm = DeckMgr( deck ) File "C:\Users\Monja\AppData\Roaming\.anki\plugins\morph\auto.py", line 82, in __init__ self.loadCfg() File "C:\Users\Monja\AppData\Roaming\.anki\plugins\morph\auto.py", line 103, in loadCfg self.saveCfg() File "C:\Users\Monja\AppData\Roaming\.anki\plugins\morph\auto.py", line 108, in saveCfg f = gzip.open( self.cfgPath, 'wb' ) File "C:\cygwin\home\dae\Home\anki\win\build\pyi.win32\anki\outPYZ1.pyz/gzip", line 34, in open File "C:\cygwin\home\dae\Home\anki\win\build\pyi.win32\anki\outPYZ1.pyz/gzip", line 89, in __init__ IOError: [Errno 2] No such file or directory: 'C:\\Users\\Monja\\AppData\\Roaming\\.anki\\plugins\\morph\\dbs\\deck\\8547 Japanese Sentences - from the \\config' Thank you! Mighty Morphin Morphology - overture2112 - 2012-09-14 The plugin's data should all be stored in "C:\\Users\\Monja\\AppData\\Roaming\\.anki\\plugins\\morph" so if you delete 'morph' and 'morphMan.py' in your plugins directory you should be able to reinstall cleanly. You're not seeing the set iPlusN features (or probably the It's Morphin Time menu at all) because the plugin is crashing before it creates the new menu. |