Back

LINUX IME/Handwriting Recognition

#1
So I was thinking, right, how... lacking the IMEs for Chinese and Japanese are in Linux. Particularly given the lack of documentation of some of them. That's not to say they can't be used with a little effort and practice (and RTFMing), but it's still not great. So, first question, what IMEs do you use for Asian languages (either Chinese or Japanese).

On a related topic, one of the things I miss most about Windows was the handwriting recognition... and then I found Cellwriter for Linux. Now, Cellwriter is missing its full potential, because there's no way of looking up characters to add to its database of thousands and thousands of characters - which means the only way through is to spend a few days adding each one manually. Screw up on a character, and you have to trawl through the entire list to find it and replace the sample.

So... is there anything like Cellwriter for Chinese characters already in Linux that I'm missing (or even just a really good IME)? And secondly, would anyone be interested in completing the database with good handwriting samples as a collaborative project? It wouldn't take so long if there was more than one person doing it, besides which it's a waste of anyone's time entering samples if there are already some out there.

I only ask as I'm doing Cantonese at the moment (with no knowledge of Mandarin), which therefore limits the use of pinyin based methods.

~DJ
Reply
#2
Personally I use the SKK input method for Japanese. Not very exciting, but it does the job (and I kind of like the way that it makes you explicitly mark the okurigana boundary by typing 話した as "HanaShita".

I think that kanji recognition in general is a hard problem and one where the Free implementation still trail well behind the commercial implementations. There are a few more or less promising starts on the problem (eg zinnia, tomoe, tegaki), but no obvious winner yet.

You're right also that documentation for IMEs in general can be rather sketchy. This is particularly true if your native language isn't Japanese, because most documentation is aimed at Japanese speakers so (a) it's in Japanese and (b) it tends to assume you've configured the system to run with a Japanese locale. This is just a consequence of the fact that second-language-learners aren't a very large set of people...
Reply
#3
WOW, cheers for the links, I had no idea there was actually any semblance of handwriting recognition available Big Grin but yeah you're right, it's a shame us language learners get ignored Sad
Reply
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions! - Sign up here
JapanesePod101
#4
donjorge22 Wrote:So... is there anything like Cellwriter for Chinese characters already in Linux that I'm missing (or even just a really good IME)?
yeah, as pm215 says, open source unix recognition engines are pretty much limited to cellwriter, tomoe and zinnia

the best recogniser for unix I've seen is sharp's crim on the zaurus (linux/qpe), but there's no source...

donjorge22 Wrote:And secondly, would anyone be interested in completing the database with good handwriting samples as a collaborative project? It wouldn't take so long if there was more than one person doing it, besides which it's a waste of anyone's time entering samples if there are already some out there.
I had this idea, but it didn't go anywhere: http://code.google.com/p/cellwriter/issu...l?id=11#c2
Reply
#5
cangy Wrote:the best recogniser for unix I've seen is sharp's crim on the zaurus (linux/qpe), but there's no source...
Interesting... shame it hasn't been ported.

cangy Wrote:
donjorge22 Wrote:And secondly, would anyone be interested in completing the database with good handwriting samples as a collaborative project? It wouldn't take so long if there was more than one person doing it, besides which it's a waste of anyone's time entering samples if there are already some out there.
I had this idea, but it didn't go anywhere: http://code.google.com/p/cellwriter/iss … l?id=11#c2
That's a good idea in there... is it still even being maintained? I tried emailing the author a couple of times, but nada...
Reply
#6
cangy Wrote:the best recogniser for unix I've seen is sharp's crim on the zaurus (linux/qpe), but there's no source...
Yes, I thought the recognition on my Zaurus was great. It was partly the Zaurus I had in mind when I said that the commercial recognition engines are still way ahead here.
Reply
#7
oh, there's one more: kanjipad (though it's not integrated as an ime)
Reply
#8
cangy Wrote:oh, there's one more: kanjipad (though it's not integrated as an ime)
You can use kanjipad as an input method via the gjiten dictionary. I don't remember exactly how gjiten does this, but it runs kanjipad as a separate process and then inputs into the gjiten window, so it must be possible to do that without gjiten too. Note that kanjipad is basically an abandoned project though. It doesn't really compile properly any more, and nobody will fix any bugs, so unless you are confident, you might want to avoid it.

As for the other options, I could never get Tomoe to compile. Plus, the underlying data is seriously wonky, and nobody replies to messages if you report bugs on the Tomoe mailing list, so I'd suggest ditching it. Mathieu Blondel is pretty actively developing Tegaki so it may be the best bet for the future. Zinnia is just an engine without any front end, but I believe Mathieu has it as an option for Tegaki. There is a mailing list at Google Groups so if you are interested you could ask him there.
Reply