I've been trying 'vocabRank', but I don't know if it's doing what I would expect.
I think the algorithm looks at each morpheme in a sentence, and ranks highly if it's in the known.db exactly as is, and a bit less highly if it's partially known, therefore if a sentence is iPlusN=1 and has several known words it's going to be ranked pretty easy regardless of what the +1 is. In particular, sentences that have kanji that occur a lot in known.db are ranked higher.
I guess that to determine difficulty of learning a new card, rather than counting how much I know in a sentence, I want to try to count how much I don't know.
What do you think of something like this, for example? Lower points = easier.
- exact match in known.db = 0 points
- no match and no kanji = 10 points
- kanji characters that match position and sound = 5 points
- or kanji characters that exist in known.db = 10 points
- or kanji characters that are all new = 20 points
Edit: I implemented a slightly different version of the above that takes count of how many times a kanji appears in you known.db as well, and sent the code to overture2112 to check.
I've also changed my code to use the dictionary form for all verbs & adjectives, and pass through Mecab a 2nd time to get the readings. If anyone's interested I could share the modified plugin, or overture2112 if you're interested in merging I can send you that code.
I think the algorithm looks at each morpheme in a sentence, and ranks highly if it's in the known.db exactly as is, and a bit less highly if it's partially known, therefore if a sentence is iPlusN=1 and has several known words it's going to be ranked pretty easy regardless of what the +1 is. In particular, sentences that have kanji that occur a lot in known.db are ranked higher.
I guess that to determine difficulty of learning a new card, rather than counting how much I know in a sentence, I want to try to count how much I don't know.
What do you think of something like this, for example? Lower points = easier.
- exact match in known.db = 0 points
- no match and no kanji = 10 points
- kanji characters that match position and sound = 5 points
- or kanji characters that exist in known.db = 10 points
- or kanji characters that are all new = 20 points
Edit: I implemented a slightly different version of the above that takes count of how many times a kanji appears in you known.db as well, and sent the code to overture2112 to check.
I've also changed my code to use the dictionary form for all verbs & adjectives, and pass through Mecab a 2nd time to get the readings. If anyone's interested I could share the modified plugin, or overture2112 if you're interested in merging I can send you that code.
Edited: 2011-06-05, 2:24 am
