Back

Mighty Morphin Morphology

#73
I figured I should give you an update on what I've been working on.

I've gotten it up to a point that I think is quite usable. I've accounted for a lot of reading transformations and can find a solution for 82% of the words/readings in JMdict. A lot of them don't have standard readings to begin with, so the success rate is actually higher. I'm actually surprised at the large number of the non-standard readings (I'm estimating more than 20k). I've even tagged the types of transformations so you can search specifically for those. As an example, say you have the kanji/reading 人 = ひと. You'll get back every word it knows with that reading. Now, the ひ in ひと can have a ゛(dakuten) or a ゜(handakuten) added to it. Those are two of the tags available, and you can search for only words with a particular tag, or as many tags as you want. So in this example, we restrict output to only words with those two tags (there's a "regular" tag for no transformations).

I've also recorded the index of the character, both from the start and from the end (so you can search backwards). An index of -1 means the last character, -2 the second last, etc. The index ignores kana and okurigana, so in word like 刈り入れ人 = かりいれびと, the indexes would be
刈 = 0
入 = 1
人 = 2
so you can easily search for something as the nth kanji in the word.

Now, the real problem is what do I do with all this information? I'd love some ideas.

As the next step, I'm going to try and see how I can integrate this into overture's plugin for ranking vocab. I also had the idea of colouring in readings of characters on the answer side of cards. I know it's probably against the whole "disfluent=memorize better" concept we've recently read about, but I'll do it anyway. I figured if association helps with memorizing things, it can't hurt to have another thing to associate a kanji reading to.
Edited: 2011-06-10, 4:36 am
Reply

Messages In This Thread