Back

Mighty Morphin Morphology

#9
jettyke Wrote:
overture2112 Wrote:That said, I can get the morphemes in order and with parts of speech information, so perhaps someone could look at the morpheme output of some sentences and figure out some simple rules to detect various patterns.
Maybe you could find a list of particles and count the number of particles per sentence and sort the sentences based on the amount of particles per sentence.

But there would be problems due to some nouns and adjectives being in hiragana and katakana
It can already detect particles correctly, just remove it from the blacklist when you do a view/iPlusN. You can edit the code to change the default blacklist in the code (it's @ morph/util.py:14). I haven't experimented too much with what good blacklists would be, so the default of punctuation and particles is just my current personal preference.

I could also expose a whitelist option (eg, to only do iPlusN for particles- although you can accomplish this with a very long blacklist too), or an option to filter the sub-part of speech group (3rd column) too, if people think it'd be useful.

Boy.pockets Wrote:Another thing I thought might be cool (maybe you are already doing this): using the kanji readings to look for the next best word to use. For example, say you already know '出席「しゅっせき」', then an easy one to learn next might be '出廷「しゅってい」' (especially if you already know 'tei' from another word).
Awesome idea! I'll try playing with that soon.

Problem: given a kanji compound, how can you determine which parts of the reading are from which kanji?
Edited: 2011-05-06, 10:39 am
Reply

Messages In This Thread