Hi all
I'm developing a software tool for japanese sutdy. Currently I'm writing the code for import EDICT2 content into internal db. However, casu my poor knowledge of japanese language, I have some doubt regarding how the data are formatted inside the EDICT2 file. Reading the official documentation explaining the following:
-------------------------------------------------------------------------------
the EDICT2 file is in an expanded form of the original EDICT format. The main differences are the inclusion of multiple kanji headwords and readings, and the inclusion of cross-reference and other information fields, e.g.:
KANJI-1;KANJI-2 [KANA-1;KANA-2] /(general information) (see xxxx) gloss/gloss/.../
-------------------------------------------------------------------------------
Now what I understand from this explanation is that each line of the file contain data based to this shema:
kanji1;kanji2;kanji3 [reading of kanji1;reading of kanji2;reading of kanji3]
However, checking the file it seem this schema is not correct. I can find some line like:
嘈囃;そう囃 [そうざつ]
This mean both these kanji have the same unique reading? Or again:
噯;噯気;噫気;噯木(iK) [おくび(噯,噯気);あいき(噯気,噫気,噯木)]
This is a different way for assign the reading to different kanji? In this case it seem the kanji 噯気 have two different readings.
Someone can explain me better the correct format of the file or point me to a site where there is a better explanation than the "official" one?
Thank you
I'm developing a software tool for japanese sutdy. Currently I'm writing the code for import EDICT2 content into internal db. However, casu my poor knowledge of japanese language, I have some doubt regarding how the data are formatted inside the EDICT2 file. Reading the official documentation explaining the following:
-------------------------------------------------------------------------------
the EDICT2 file is in an expanded form of the original EDICT format. The main differences are the inclusion of multiple kanji headwords and readings, and the inclusion of cross-reference and other information fields, e.g.:
KANJI-1;KANJI-2 [KANA-1;KANA-2] /(general information) (see xxxx) gloss/gloss/.../
-------------------------------------------------------------------------------
Now what I understand from this explanation is that each line of the file contain data based to this shema:
kanji1;kanji2;kanji3 [reading of kanji1;reading of kanji2;reading of kanji3]
However, checking the file it seem this schema is not correct. I can find some line like:
嘈囃;そう囃 [そうざつ]
This mean both these kanji have the same unique reading? Or again:
噯;噯気;噫気;噯木(iK) [おくび(噯,噯気);あいき(噯気,噫気,噯木)]
This is a different way for assign the reading to different kanji? In this case it seem the kanji 噯気 have two different readings.
Someone can explain me better the correct format of the file or point me to a site where there is a better explanation than the "official" one?
Thank you
