netsplitter Wrote:I figured I should give you an update on what I've been working on.Awesome.
I had a bit of trouble understanding what it actually did until I read your read me - which might help other too. As I understand, it finds the readings for individual kanji given a word.
netsplitter Wrote:Now, the real problem is what do I do with all this information? I'd love some ideas.I guess that this is what you are already thinking of, but I will say it anyway. With all this information, we (as in you), can figure out the n + 0.5. If you know what part of the reading belongs to what part of the word, then you can look for new word that have that kanji and that reading. You can follow the exact same model that Overture currently uses in the MMM plugin;
* We have a list of what we know (known.db)
* Now, not only do we have words we know, but we have individual kanji readings too.
* Look for new words with unknown words, but with known (as much as possible) readings.
This is awesome.
Example
My known db has:
- 出発 「しゅっぱつ 」(word)
| - 出 「しゅっ 」 (reading)
| - 発 「ぱつ 」 (reading)
- 自分 「じぶん」 (word)
| - 自 「じ」 (reading)
| - 分 「ぶん」 (reading)
So, some n + 0.5s might be (in order of preference):
1. 出自 「しゅっじ」(a perfect "0.5" match) - (all kanji are known and all readings are known, it is just the 'word' we need to learn)*
2. 自発 「じはつ」 (almost full "0.5" match) - (know both readings, though the 「ぱつ」 has become 「はつ」)
3. 自慢 「じまん」(one character off a "0.5" match) - (there is one kanji who's reading is unknown)
4. 自費出版 「じひしゅっぱん」 (two characters off a "0.5" match) - (there are two characters who's readings are unknown)
summary
So, with this information, we can calculate an even "better" n+1. So instead of having to (possibly) learn a completely new reading for a new word, we can learn a new word with readings we already know. Or, at least one reading we already know.
Note:
*This reading is not correct (should be しゅつじ), but for the sake of the example, please let it slide...
