#1
Hi all

I'm a japanese student and also a programmer. I recently found the mecab library during a search for know if there is a way to "parse" japanese text (basically put spaces between words). I found this library very useful and I thought to develop a free tool using this library for help me and other students like me. I made also some addition and I think I'll improve in the future. However now I have the first beta version and I need the help of some japanese expert in the forum for test it. Currently the tool have two main section.

1 - Parse japanese text using mecab engine and show text with furigana
2 - converter between romaji, hiragana and katakana made by me

Now regarding the point one I have the following problem: mecab parse japanese text in various chunks. The problem is I would to obtain a japanese text with spaces between words but mecab separate too much. For make an example a verb is splitted in the main radix and the suffix. In this case I want to show these two chunks as a single words like in the reality is. I can insert the rules for make these conjuctions but I don't know the rules since I'm at beginner level. If someone want to help he should to insert various text and, in case of no required space, report me the rule to apply and the test text for allow me to reproduce the issue and verify in the fix is correct.

Regarding the point two I tried to apply all the conversion rules I know between kana and romaji but I have some difficult to know if I worked well concerning the extended katakana that is, for me, the most difficult part since there is variations based to the translation mode to use (Helburn and others). Also in this case error report will be welcomed.

The link for download the tool is the following:

https://dl.dropboxusercontent.com/u/64769600/jtool.zip

The file is around 30MB since it have ipadic dictionary inside. Simply unzip into a folder and launch j-tool.exe

Thank you to all the people will want to help me Smile Smile Smile
Reply
#2
Next step, after "stabilized" the text parsing, will be to add the feature for import content from edict japanese dictionary. Once created a database of dictionary will be possible to show the translation of each single word maybe below furigana...

Obviously each suggestions for useful features to add will be welcomed. Wink
Reply
#3
falsinsoft Wrote:2 - converter between romaji, hiragana and katakana made by me
Have a look at this
http://www.sharktime.com/us_wReplace.html

Quote:wReplace is useful for language learning. It allows you to convert between different notations/writing systems, and to approximately phonetically transcribe text. Possible applications:

Japanese, text conversion both ways:
Romaji ↔ Hiragana,
Romaji ↔ Katakana,
Katakana ↔ Hiragana.
Russian; Cyrillic, conversion into Latin phonetic transcription ISO 9-1995.

wReplace can be used free of charge.
It can convert HUUUGe texts in a jiffy.
Reply
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions! - Sign up here
JapanesePod101
#4
Hi

Thank you for your reply even if I do not understand the meaning very well. I already made a conversion engine between kana and kanji, I don't need it. Regarding the basic roules I'm quite sure to worked correctly. My doubs concern the different way to "translate" sounds non originally present in Japanese language. This is covered, from what I understoond, by the use of the extended katakana as reported in the link below:

http://en.wikipedia.org/wiki/Hepburn_romanization

I'm not sure to have applied correctly all the rules than I'm looking for someone able to test the tool and report me problems...
Edited: 2013-07-10, 6:59 am
Reply
#5
If someone is interested here there is a screenshot of the tool (don't ask me what is written in japanese since I got this random text from an online site Smile )

[Image: j-tool-1.0-beta-1.jpg]
Edited: 2013-07-10, 3:30 pm
Reply
#6
Hi all

First official 0.1 version of thsi tool has been released. Please, if some japanese expert will can, test it and report me the incorrect things. New features will be added in future. ^_^

Here page info:

http://falsinsoft-software.blogspot.com/...-tool.html

Here download link:

http://www.softpedia.com/get/PORTABLE-SO...Tool.shtml

Screenshot

[Image: j-tool-.01-2.jpg]
Reply
#7
Haven't tried it but:

- You're not annotating numbers correctly, you're annotating each digit separately as if it were "one one" instead of "eleven".

- Furigana (the transliterating kana) traditionally goes on top of the kanji, when written left-to-right like this. Having it below feels unnatural.
Reply