kanji koohii FORUM
Mecab output meaning? - Printable Version

+- kanji koohii FORUM (http://forum.koohii.com)
+-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html)
+--- Forum: Learning resources (http://forum.koohii.com/forum-9.html)
+--- Thread: Mecab output meaning? (/thread-10680.html)



Mecab output meaning? - Suppaman - 2013-04-03

Hi all

I'm another japanese student. ^_^ Recently I found this tool called mecab able to analyze the japanese text and return info regarding the various chunks. My problem is that I don't understand the info returned. To be more precise I found the meaning of various fields in case of "chasen" output type but this library have also a "detailed" explanation that seem to have a "fixed format" of data show. just for make an example using the test tool provided it show an example like this:

すもももももももものうち
すもも 名詞,一般,*,*,*,*,すもも,スモモ,スモモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも 名詞,一般,*,*,*,*,もも,モモ,モモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも 名詞,一般,*,*,*,*,もも,モモ,モモ
の 助詞,連体化,*,*,*,*,の,ノ,ノ
うち 名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ
EOS

Get the first line:

すもも 名詞,一般,*,*,*,*,すもも,スモモ,スモモ

The first chunk (すもも) is the original "word" to which the information relates. But al the other fields separated by comma what are? Also read the ipadic pdf documentation (the dictionary used by the tool) I can not know which info I should to expect in these field. Especially because in many of these appear very often the symbol '*'.

Someone have a better knowledge that can share with me? ;-)

Thank you


Mecab output meaning? - Oniichan - 2013-04-03

From the MeCab website:
http://mecab.googlecode.com/svn/trunk/mecab/doc/index.html?source=navbar


表層形\t品詞,品詞細分類1,品詞細分類2,品詞細分類3,活用形,活用型,原形,読み,発音


Mecab output meaning? - Suppaman - 2013-04-03

Hi

Thank you very much. The funny thing is that the reply of my question was exactly two lines before the same example block I got as example for this post... Unfortunately the mecab documentation are written all japanese and I was not able to find the info I needed. Thank you again for your help. ^_^


Mecab output meaning? - lauri_ranta - 2013-04-04

I have posted some notes about MeCab usage at http://lri.me/japanese/notes.html.

$ mecab -F '%m %t %f[6]\n' -E '\n' <<< 来て
来 2 来る
て 6 て

%m: surface form of morpheme
%t: type (0: other, 2: kanji, 3: special character, 4: number, 5: latin, 6: hiragana, 7: katakana)
%f[6]: lexical form of morpheme (field 6 in the default output)

$ mecab <<< 来て
来 動詞,自立,*,*,カ変・来ル,連用形,来る,キ,キ
て 助詞,接続助詞,*,*,*,*,て,テ,テ
EOS

表層形\t品詞,品詞細分類1,品詞細分類2,品詞細分類3,活用形,活用型,原形,読み,発音

%m 表hyou層sou形kei surface form
%f[0] 品hin詞shi part of speech (for example 助jo詞shi / particle)
%f[1] 品hin詞shi細sai分bun類rui part of speech subcategory (for example 接setsu続zoku助jo詞shi / conjunctive particle)
%f[4] 活katsu用you形kei conjugation form (for example カ変hen・来ル / irregular conjugation - kuru)
%f[5] 活katsu用you型kata conjugation (for example 連ren用you形kei / conjunctive form)
%f[6] 原gen形kei lexical form
%f[7] 読yoみ
%f[8] 発hatsu音on pronunciation


Mecab output meaning? - Suppaman - 2013-04-04

Hi

Thank you for your additional explanation, very useful. Regarding the info returned by mecab I have question about dictionary used. The installable version of mecab is provided with ipadic dictionary. However the author of the library recently made available for download also an additional dictionary called jumandic (download from http://code.google.com/p/mecab/downloads/list). Do you know the difference between these two dictionary and, more important, if both the dictionaries return the same info or change some item description?

Thank you


Mecab output meaning? - Suppaman - 2013-04-06

Hi

Someone can explain me better the difference between 読み and 発音? My translation is "reading" and "pronunciation" but what si the difference between these two terms? In my understanding it sould be the same thing...

Sorry for my ignorance... :-(

Thank you


Mecab output meaning? - tokyostyle - 2013-04-07

Suppaman Wrote:Someone can explain me better the difference between 読み and 発音? My translation is "reading" and "pronunciation" but what si the difference between these two terms? In my understanding it sould be the same thing...
When and how this differs is probably largely dependent on the dictionary used, but consider this example[1]:

雰囲気
読み:ふんいき
発音:ふいんき

The article linked above also lists a couple of other examples.

(Note: This distinction is not present in IPADIC which suggests that in practice they are always exactly the same.)