Mecab output meaning?

Index » Learning resources

  • 1
 
Reply #1 - 2013 April 03, 2:21 am
Suppaman New member
Registered: 2013-04-03 Posts: 9

Hi all

I'm another japanese student. ^_^ Recently I found this tool called mecab able to analyze the japanese text and return info regarding the various chunks. My problem is that I don't understand the info returned. To be more precise I found the meaning of various fields in case of "chasen" output type but this library have also a "detailed" explanation that seem to have a "fixed format" of data show. just for make an example using the test tool provided it show an example like this:

すもももももももものうち
すもも  名詞,一般,*,*,*,*,すもも,スモモ,スモモ
も      助詞,係助詞,*,*,*,*,も,モ,モ
もも    名詞,一般,*,*,*,*,もも,モモ,モモ
も      助詞,係助詞,*,*,*,*,も,モ,モ
もも    名詞,一般,*,*,*,*,もも,モモ,モモ
の      助詞,連体化,*,*,*,*,の,ノ,ノ
うち    名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ
EOS

Get the first line:

すもも  名詞,一般,*,*,*,*,すもも,スモモ,スモモ

The first chunk (すもも) is the original "word" to which the information relates. But al the other fields separated by comma what are? Also read the ipadic pdf documentation (the dictionary used by the tool) I can not know which info I should to expect in these field. Especially because in many of these appear very often the symbol '*'.

Someone have a better knowledge that can share with me? ;-)

Thank you

Reply #2 - 2013 April 03, 5:25 am
Oniichan Member
From: 名古屋 Registered: 2009-02-02 Posts: 269

From the MeCab website:
http://mecab.googlecode.com/svn/trunk/m … rce=navbar


表層形\t品詞,品詞細分類1,品詞細分類2,品詞細分類3,活用形,活用型,原形,読み,発音

Reply #3 - 2013 April 03, 5:44 am
Suppaman New member
Registered: 2013-04-03 Posts: 9

Hi

Thank you very much. The funny thing is that the reply of my question was exactly two lines before the same example block I got as example for this post... Unfortunately the mecab documentation are written all japanese and I was not able to find the info I needed. Thank you again for your help. ^_^

Advertising (register and sign in to hide this)
JapanesePod101 Sponsor
 
Reply #4 - 2013 April 04, 3:17 am
lauri_ranta Member
Registered: 2012-03-31 Posts: 139 Website

I have posted some notes about MeCab usage at http://lri.me/japanese/notes.html.

$ mecab -F '%m %t %f[6]\n' -E '\n' <<< 来て
来 2 来る
て 6 て

%m: surface form of morpheme
%t: type (0: other, 2: kanji, 3: special character, 4: number, 5: latin, 6: hiragana, 7: katakana)
%f[6]: lexical form of morpheme (field 6 in the default output)

$ mecab <<< 来て
来    動詞,自立,*,*,カ変・来ル,連用形,来る,キ,キ
て    助詞,接続助詞,*,*,*,*,て,テ,テ
EOS

表層形\t品詞,品詞細分類1,品詞細分類2,品詞細分類3,活用形,活用型,原形,読み,発音

%m 表hyou層sou形kei surface form
%f[0] 品hin詞shi part of speech (for example 助jo詞shi / particle)
%f[1] 品hin詞shi細sai分bun類rui part of speech subcategory (for example 接setsu続zoku助jo詞shi / conjunctive particle)
%f[4] 活katsu用you形kei conjugation form (for example カ変hen・来ル / irregular conjugation - kuru)
%f[5] 活katsu用you型kata conjugation (for example 連ren用you形kei / conjunctive form)
%f[6] 原gen形kei lexical form
%f[7] 読yoみ
%f[8] 発hatsu音on pronunciation

Last edited by lauri_ranta (2013 April 04, 4:11 am)

Reply #5 - 2013 April 04, 3:44 am
Suppaman New member
Registered: 2013-04-03 Posts: 9

Hi

Thank you for your additional explanation, very useful. Regarding the info returned by mecab I have question about dictionary used. The installable version of mecab is provided with ipadic dictionary. However the author of the library recently made available for download also an additional dictionary called jumandic (download from http://code.google.com/p/mecab/downloads/list). Do you know the difference between these two dictionary and, more important, if both the dictionaries return the same info or change some item description?

Thank you

Reply #6 - 2013 April 06, 3:17 pm
Suppaman New member
Registered: 2013-04-03 Posts: 9

Hi

Someone can explain me better the difference between  読み and 発音? My translation is "reading" and "pronunciation" but what si the difference between these two terms? In my understanding it sould be the same thing...

Sorry for my ignorance... :-(

Thank you

Reply #7 - 2013 April 07, 9:26 am
tokyostyle Member
From: Tokyo Registered: 2008-04-11 Posts: 720

Suppaman wrote:

Someone can explain me better the difference between  読み and 発音? My translation is "reading" and "pronunciation" but what si the difference between these two terms? In my understanding it sould be the same thing...

When and how this differs is probably largely dependent on the dictionary used, but consider this example[1]:

雰囲気
読み:ふんいき
発音:ふいんき

The article linked above also lists a couple of other examples.

(Note: This distinction is not present in IPADIC which suggests that in practice they are always exactly the same.)

  • 1