kanji koohii FORUM
MeCab Japanese analyzer in your web browser - Printable Version

+- kanji koohii FORUM (http://forum.koohii.com)
+-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html)
+--- Forum: Learning resources (http://forum.koohii.com/forum-9.html)
+--- Thread: MeCab Japanese analyzer in your web browser (/thread-12191.html)



MeCab Japanese analyzer in your web browser - aldebrn - 2014-09-17

Sorry for making this a new thread instead of adding to the "Software" thread. This is an alpha-level port of MeCab to Javascript, so that it can run in any recent web browser, and I'm soliciting bug reports and feature requests.

Please try it out at http://fasiha.github.io/mecab-emscripten/

Background MeCab is technically a morphological analyzer and part-of-speech tagger. What that means to me, as a non-linguist, is "it puts spaces between Japanese words (-O wakati), it converts Japanese to katakana (-O yomi), it breaks down entire sentences into parts-of-speech (-O chasen), and through some crazy magic, Damien Elmes' Japanese Support Anki plugin can add reasonably accurate furigana to Japanese text (not yet supported since it uses Kakasi, another tool, in conjunction with MeCab, with some Python glue inbetween)".

It's usually a pain in the a$$ to install. A kind Koohiite helped me (and many others) by making a video tutorial on getting it set up in Windows 7 (
). You know when someone has to make a video tutorial on installing and using a piece of software that we're still in the 1990s.

So as a mini-project I put it through the Emscripten cross-compiler, which compiled the C++ source code to Javascript, so now it runs in your Firefox, Chrome, Safari, and Chrome on iPhone (the ones I've tested so far that work; Safari on iPhone doesn't work, yet). It takes a few seconds to download the 50MB dictionary, but once it's ready, type/paste some 日本語, enter a flag like "-O chasen", click Submit, and get your result.

It's worked pretty well on all the input I've given it but I'm sure there's flaws to fix and improvements to add. Please feel free to post here if you don't want to make a Github account to post on the bugtracker there (https://github.com/fasiha/mecab-emscripten/issues).


MeCab Japanese analyzer in your web browser - yogert909 - 2014-11-13

I forgot to say thanks for this.

A few months ago I spent an evening trying to get mecab installed at home and finally said to heck with mecab. Today have a need for it again and I remember you made this. It works beautifully so far!

Thanks again!


MeCab Japanese analyzer in your web browser - yogert909 - 2014-11-13

I have a feature request, if it's not too much trouble of course. I need to batch process thousands of text files and save them out with an appended file name. Any chance a file upload / batch processor could be added easily?


MeCab Japanese analyzer in your web browser - aldebrn - 2014-11-13

yogert909 Wrote:I have a feature request, if it's not too much trouble of course. I need to batch process thousands of text files and save them out with an appended file name. Any chance a file upload / batch processor could be added easily?
While I think about how to do this effectively, a short-gap might be to append all your files into a single one, with file separation markers, i.e., `for file in *.txt; do echo "=== $file ==="; cat $file >> all.texts; done` and copy-paste that into MeCab. I just pasted a 280kb file (Natsume Soseki's Botchan) into MeCab wakati mode, and it produced the output immediately, and I'll test it some more later, see how it handles e.g., 15 MB inputs. (Firefox is supposed to run Emscripted code a little faster (10~20% faster than Chrome), thanks to asm.js.)