I have just posted version 2.2 of Capture2Text.
Download Capture2Text v2.2 via SourceForge (source code is included)
What Changed?
● Upgraded to Tesseract v3.02.02 (see
http://code.google.com/p/tesseract-ocr/w...leaseNotes for the changelist).
● Simplified the special tokens used in the substitution feature a bit and fixed a whitespace bug.
Things of limited interest to Japanese learners:
● Added a whitelist option for Tesseract dictionaries. Allows you limit the characters that Capture2Text can recognize, such as only digits.
● Added support for more languages. The complete list:
Afrikaans, Albanian, Ancient Greek, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Catalan, Cherokee, Chinese, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, Frankish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Macedonian, Malay, Malayalam, Maltese, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovakian, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Telugu, Thai, Turkish, Ukrainian, Vietnamese.