Long story short: I’m wondering what lightweight Japanese NLP libraries people are using in apps (preferably Java-based).
Basically, of the ones I've tried, none are able to find the base forms of Godan verbs accurately (usually for imperative and potential conjugations). I wouldn't mind this limitation for what I'm doing, but no apps I've used seem to have any trouble like this, so it's making me curious. Maybe I'm searching for the wrong terms or misunderstanding something.
I'll list the libraries I've tried below from searching for terms like tokenizer, morphological analyzer, stemmer, 形態素解析. None of them seem to handle the Godan issue.
Sanmoku has been stale for the past 4 years, but it runs well on Android and tokenises pretty accurately.
MeCab seems to be one of the most popular libraries, but it's written in C++.
Kuromoji looked good but seems to be too heavy for Android.
I also found Gosen and Igo, but they didn’t seem suitable for Android.
Basically, of the ones I've tried, none are able to find the base forms of Godan verbs accurately (usually for imperative and potential conjugations). I wouldn't mind this limitation for what I'm doing, but no apps I've used seem to have any trouble like this, so it's making me curious. Maybe I'm searching for the wrong terms or misunderstanding something.
I'll list the libraries I've tried below from searching for terms like tokenizer, morphological analyzer, stemmer, 形態素解析. None of them seem to handle the Godan issue.
Sanmoku has been stale for the past 4 years, but it runs well on Android and tokenises pretty accurately.
MeCab seems to be one of the most popular libraries, but it's written in C++.
Kuromoji looked good but seems to be too heavy for Android.
I also found Gosen and Igo, but they didn’t seem suitable for Android.
