I wrote an Anki plugin to choose vocab for an RTK deck, to add extra information to the question without actually switching over to Japanese keywords. The aim is to prioritise known words and common words, and avoid putting words on the front which might be misleading (but if they're known or common, it's nice to have them on the back).
It loads the user's MorphMan DBs to detemine what is "known".
By default, a word is "common" if the expression and reading are both tagged with JMDICT "nf10" or less (first 5000 words in "news1"). It's "not misleading" if either there is one possible kanji to fill in the blank from the whole of JMDICT; or the answer (expression and reading) is tagged "gai1"/"ichi1"/"news1"/"spec1" (the EDICT "P" tag) and the alternative answers have no JMDICT priority tag (the above plus "gai2"/"ichi2"/"news2"/"spec2").
Some examples are shown below. They use css classes: in the screenshots, the yellow highlight means "you should be able to read this", purple text means "not unique but you probably won't pick the wrong kanji", and red text means "risk of picking the wrong kanji". My current css (not pictured) uses green highlight for "you should be able to read this in kanji" and yellow highlight for "you should be able to read this in kana".
If choosing the words by hand, it would be fine to put おる on the front because 居る and 織る don't mean "fold". But automatic choices are never going to be perfect.
https://github.com/HelenFoster/misc/tree...kanjivocab
I'm working on a program to choose vocab for an RTK deck, to add extra information to the question without actually switching over to Japanese keywords. The aim is to prioritise known words and common words, and avoid putting words on the front which might be misleading (but if they're known or common, it's nice to have them on the back).
Currently, a word is "common" if the expression and reading are both tagged with JMDICT "nf15" or less (first 7500 words in "news1"). It's "not misleading" if either there is one possible kanji to fill in the blank from the whole of JMDICT; or the answer (expression and reading) is tagged "gai1"/"ichi1"/"news1"/"spec1" (the EDICT "P" tag) and the alternative answers have no JMDICT priority tag (the above plus "gai2"/"ichi2"/"news2"/"spec2").
"Known" words are a messy hack at the moment. Not sure about the best way to do this.
Some examples are shown below. They use css classes: currently, the yellow highlight means "you should be able to read this", purple text means "not unique but you probably won't pick the wrong kanji", and red text means "risk of picking the wrong kanji".
Is this interesting? Would it be worthwhile for me to work this thing into a reusable state? (Has someone done this already and I missed it?)
![[Image: XPnDUYC.png]](http://i.imgur.com/XPnDUYC.png)
![[Image: epRjrEJ.png]](http://i.imgur.com/epRjrEJ.png)
![[Image: 9txYjdK.png]](http://i.imgur.com/9txYjdK.png)
![[Image: 5cOa3em.png]](http://i.imgur.com/5cOa3em.png)
![[Image: fX0p6da.png]](http://i.imgur.com/fX0p6da.png)
It loads the user's MorphMan DBs to detemine what is "known".
By default, a word is "common" if the expression and reading are both tagged with JMDICT "nf10" or less (first 5000 words in "news1"). It's "not misleading" if either there is one possible kanji to fill in the blank from the whole of JMDICT; or the answer (expression and reading) is tagged "gai1"/"ichi1"/"news1"/"spec1" (the EDICT "P" tag) and the alternative answers have no JMDICT priority tag (the above plus "gai2"/"ichi2"/"news2"/"spec2").
Some examples are shown below. They use css classes: in the screenshots, the yellow highlight means "you should be able to read this", purple text means "not unique but you probably won't pick the wrong kanji", and red text means "risk of picking the wrong kanji". My current css (not pictured) uses green highlight for "you should be able to read this in kanji" and yellow highlight for "you should be able to read this in kana".
If choosing the words by hand, it would be fine to put おる on the front because 居る and 織る don't mean "fold". But automatic choices are never going to be perfect.
https://github.com/HelenFoster/misc/tree...kanjivocab
Currently, a word is "common" if the expression and reading are both tagged with JMDICT "nf15" or less (first 7500 words in "news1"). It's "not misleading" if either there is one possible kanji to fill in the blank from the whole of JMDICT; or the answer (expression and reading) is tagged "gai1"/"ichi1"/"news1"/"spec1" (the EDICT "P" tag) and the alternative answers have no JMDICT priority tag (the above plus "gai2"/"ichi2"/"news2"/"spec2").
"Known" words are a messy hack at the moment. Not sure about the best way to do this.
Some examples are shown below. They use css classes: currently, the yellow highlight means "you should be able to read this", purple text means "not unique but you probably won't pick the wrong kanji", and red text means "risk of picking the wrong kanji".
Is this interesting? Would it be worthwhile for me to work this thing into a reusable state? (Has someone done this already and I missed it?)
![[Image: XPnDUYC.png]](http://i.imgur.com/XPnDUYC.png)
![[Image: epRjrEJ.png]](http://i.imgur.com/epRjrEJ.png)
![[Image: 9txYjdK.png]](http://i.imgur.com/9txYjdK.png)
![[Image: 5cOa3em.png]](http://i.imgur.com/5cOa3em.png)
![[Image: fX0p6da.png]](http://i.imgur.com/fX0p6da.png)
Edited: 2015-08-30, 4:00 pm
