toshiromiballza wrote:
Any chance one of you (Savii, lauri_ranta, cangy) would release your "furiganizer" scripts to the public? I'm not aware of any software that does this in bulk automatically, except for an ancient program called JGloss which is really buggy for me and produces terrible (when it does work) results (picks the first entry from EDICT, instead of for example going with those marked as common first). If output was possible in HTML5 ruby tag, so much better.
I have posted two Ruby scripts I use at http://jptxt.net/scripts.html. The first one adds furigana based on hiragana versions of words or sentences. It gets a few words like 物の怪/もののけ wrong though. The second one uses MeCab to generate furigana, like https://github.com/dae/ankiplugins/blob … reading.py. When I tried using Core 6000 sentences as input, about 5% had at least one difference from the correct furigana, but many of the differences are not necessarily errors.
Savii and cangy have added (almost?) perfect furigana for Core 6000 sentences though. I think I figured out how to do it from Savii's description, but I'm still looking for other ways to add furigana as well.
Last edited by lauri_ranta (2013 November 28, 11:45 am)
Oniichan
Member
From: 名古屋
Registered: 2009-02-02
Posts: 269
lauri_ranta wrote:
I have posted two Ruby scripts at http://lri.me/japanese/Notes.html. The first one adds furigana from hiragana versions, and the error rate is maybe 0.1% for vocabulary. The second one uses mecab to generate furigana (like Anki's Japanese support plugin / addons/japanese/reading.py), but it gets some readings wrong in about 5% of Core 6000 sentences.
I saved your scripts as ruby files and had success with the furigana one, getting this output when testing with the commented-out section:
However, when I try to run the mecab script I get this error:
I also tested mecab.exe with the following command to make sure it is setup correctly:
C:\Users\Administrator\scripts\Ruby\Sandbox>mecab -O wakati in.txt -o out.txt
which yielded an output file with spaces between the morphemes, as expected.
For further reference, I'm running the scripts via a command prompt on a windows 7 machine using Ruby 1.9.3p125. The scripts are saved as utf-8 files and my cmd's code page is set to utf-8 as well.
Any ideas why the script breaks down when I run it?
Oniichan
Member
From: 名古屋
Registered: 2009-02-02
Posts: 269
Here is another output when the code page is set to '932 (ANSI/OEM - Japanese Shift-JIS)'
I'm guessing that the console's code page has no effect on the script as the script doesn't output anything to the screen, but the encoding of mecab's dictionaries I'm not so sure about. My MeCab dictionary was encoded to utf-8 during setup. Do I need to reinstall it as EUC JP or something to make this script work?
Last edited by Oniichan (2013 March 06, 11:22 pm)