Hi all
I'm another japanese student. ^_^ Recently I found this tool called mecab able to analyze the japanese text and return info regarding the various chunks. My problem is that I don't understand the info returned. To be more precise I found the meaning of various fields in case of "chasen" output type but this library have also a "detailed" explanation that seem to have a "fixed format" of data show. just for make an example using the test tool provided it show an example like this:
すもももももももものうち
すもも 名詞,一般,*,*,*,*,すもも,スモモ,スモモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも 名詞,一般,*,*,*,*,もも,モモ,モモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも 名詞,一般,*,*,*,*,もも,モモ,モモ
の 助詞,連体化,*,*,*,*,の,ノ,ノ
うち 名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ
EOS
Get the first line:
すもも 名詞,一般,*,*,*,*,すもも,スモモ,スモモ
The first chunk (すもも) is the original "word" to which the information relates. But al the other fields separated by comma what are? Also read the ipadic pdf documentation (the dictionary used by the tool) I can not know which info I should to expect in these field. Especially because in many of these appear very often the symbol '*'.
Someone have a better knowledge that can share with me? ;-)
Thank you