Hi all
I'm a japanese student and also a programmer. I recently found the mecab library during a search for know if there is a way to "parse" japanese text (basically put spaces between words). I found this library very useful and I thought to develop a free tool using this library for help me and other students like me. I made also some addition and I think I'll improve in the future. However now I have the first beta version and I need the help of some japanese expert in the forum for test it. Currently the tool have two main section.
1 - Parse japanese text using mecab engine and show text with furigana
2 - converter between romaji, hiragana and katakana made by me
Now regarding the point one I have the following problem: mecab parse japanese text in various chunks. The problem is I would to obtain a japanese text with spaces between words but mecab separate too much. For make an example a verb is splitted in the main radix and the suffix. In this case I want to show these two chunks as a single words like in the reality is. I can insert the rules for make these conjuctions but I don't know the rules since I'm at beginner level. If someone want to help he should to insert various text and, in case of no required space, report me the rule to apply and the test text for allow me to reproduce the issue and verify in the fix is correct.
Regarding the point two I tried to apply all the conversion rules I know between kana and romaji but I have some difficult to know if I worked well concerning the extended katakana that is, for me, the most difficult part since there is variations based to the translation mode to use (Helburn and others). Also in this case error report will be welcomed.
The link for download the tool is the following:
https://dl.dropboxusercontent.com/u/64769600/jtool.zip
The file is around 30MB since it have ipadic dictionary inside. Simply unzip into a folder and launch j-tool.exe
Thank you to all the people will want to help me
I'm a japanese student and also a programmer. I recently found the mecab library during a search for know if there is a way to "parse" japanese text (basically put spaces between words). I found this library very useful and I thought to develop a free tool using this library for help me and other students like me. I made also some addition and I think I'll improve in the future. However now I have the first beta version and I need the help of some japanese expert in the forum for test it. Currently the tool have two main section.
1 - Parse japanese text using mecab engine and show text with furigana
2 - converter between romaji, hiragana and katakana made by me
Now regarding the point one I have the following problem: mecab parse japanese text in various chunks. The problem is I would to obtain a japanese text with spaces between words but mecab separate too much. For make an example a verb is splitted in the main radix and the suffix. In this case I want to show these two chunks as a single words like in the reality is. I can insert the rules for make these conjuctions but I don't know the rules since I'm at beginner level. If someone want to help he should to insert various text and, in case of no required space, report me the rule to apply and the test text for allow me to reproduce the issue and verify in the fix is correct.
Regarding the point two I tried to apply all the conversion rules I know between kana and romaji but I have some difficult to know if I worked well concerning the extended katakana that is, for me, the most difficult part since there is variations based to the translation mode to use (Helburn and others). Also in this case error report will be welcomed.
The link for download the tool is the following:
https://dl.dropboxusercontent.com/u/64769600/jtool.zip
The file is around 30MB since it have ipadic dictionary inside. Simply unzip into a folder and launch j-tool.exe
Thank you to all the people will want to help me
