![]() |
|
Capture2Text - Japanese OCR Utility - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: Learning resources (http://forum.koohii.com/forum-9.html) +--- Thread: Capture2Text - Japanese OCR Utility (/thread-6769.html) |
Capture2Text - Japanese OCR Utility - Filip - 2012-04-27 I select the image and get the correct kanji in the upper corner. When I paste however, it often shows me the kanji of the previous selection. (exactly one before.) Any idea or solution? cheers Capture2Text - Japanese OCR Utility - cb4960 - 2012-04-28 Filip Wrote:I select the image and get the correct kanji in the upper corner.The OCR isn't instantaneous - it depends on the size of the selection and the speed of your PC. Wait a second or two before pasting. Capture2Text - Japanese OCR Utility - anritsi - 2012-07-20 This is pretty handy. :D Capture2Text - Japanese OCR Utility - cb4960 - 2012-10-07 I have just posted version 2.1 of Capture2Text. Download Capture2Text v2.1 via SourceForge (source code is included) What Changed? ● Added command line options. From the readme: Code: You may OCR the screen via command line by calling Capture2Text in this format:Code: Sometimes Capture2Text consistantly makes the same OCR mistakes such ascb4960 Capture2Text - Japanese OCR Utility - cb4960 - 2012-11-06 I have just posted version 2.2 of Capture2Text. Download Capture2Text v2.2 via SourceForge (source code is included) What Changed? ● Upgraded to Tesseract v3.02.02 (see http://code.google.com/p/tesseract-ocr/wiki/ReleaseNotes for the changelist). ● Simplified the special tokens used in the substitution feature a bit and fixed a whitespace bug. Things of limited interest to Japanese learners: ● Added a whitelist option for Tesseract dictionaries. Allows you limit the characters that Capture2Text can recognize, such as only digits. ● Added support for more languages. The complete list: Afrikaans, Albanian, Ancient Greek, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Catalan, Cherokee, Chinese, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, Frankish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Macedonian, Malay, Malayalam, Maltese, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovakian, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Telugu, Thai, Turkish, Ukrainian, Vietnamese. Capture2Text - Japanese OCR Utility - cb4960 - 2012-11-09 I have just posted version 2.3 of Capture2Text. Download Capture2Text v2.3 via SourceForge (source code is included) What Changed? ● When using the Japanese (Tesseract) dictionary, revert to Tesseract v3.01. It is MUCH more accurate than v3.02.02, both with vertical text and particularly with horizontal text. ● Added option to remove the capture box before a preview OCR. This is more accurate, particularly with NHocr, but causes the capture box to flicker. Disabled by default but you should probably enable it if you use the preview feature and the Japanese (NHocr) dictionary. ● Fixed bug that caused the text direction (horizontal/vertical) to be ignored for Chinese/Japanese. The bug was introduced in the previous release. ● Now passing a .ppm image to NHocr instead of a .pgm image to better handle non-grayscale captures. You might see slightly better accuracy with the Japanese (NHocr) dictionary now. ● Fixed bug that caused the capture box to stick around after it was supposed to be removed. ● Changed the snapshot enlargement from 300% to 320% to meet Tesseract's minimum recommended DPI. ● Increased update rate of the capture box to make it appear more fluid. Capture2Text - Japanese OCR Utility - honeybunch - 2013-01-12 This seems like a pretty amazing tool. Unfortunately, I am totally unable to use it. Whatever language I'm using and whatever text I select, the preview box stays blank, and nothing is copied to my clipboard. Does anyone have any idea what could be going wrong? I'd really love to be able to use this program, but I'm clueless about how to get it working for me. Capture2Text - Japanese OCR Utility - cb4960 - 2013-01-12 honeybunch Wrote:This seems like a pretty amazing tool. Unfortunately, I am totally unable to use it. Whatever language I'm using and whatever text I select, the preview box stays blank, and nothing is copied to my clipboard.From the readme: Quote:1) Unzip the contents of the zip file. Make sure that there are no Asian orIf you did that, try unzipping the program to a very simple directory such as c:\temp and see what happens. Also, it would be helpful if you wrote what operating system you are using (XP, Vista, 7, or 8). Capture2Text - Japanese OCR Utility - honeybunch - 2013-01-13 cb4960 Wrote:If you did that, try unzipping the program to a very simple directory such as c:\temp and see what happens.I thought I did that. I had unzipped it to "C:\Documents and Settings\Michael\Desktop\Capture2Text_v2.4", but I guess something in that file path screwed it up. I just unzipped it again to "C:\temp", as you suggested, and now it works great. Guess I'll just keep it there and make a shortcut. Also, sorry for not mentioning my operating system. I'm usually better about giving information like that if I have to ask for help. I'm on XP Professional with Service Pack 3, if you still want to know. Anyhow, thanks for your help, and for making this program. It's great. Capture2Text - Japanese OCR Utility - cb4960 - 2013-01-13 honeybunch Wrote:I'm glad that you got it to work. I'll boot up XP and investigate why it doesn't like that path.cb4960 Wrote:If you did that, try unzipping the program to a very simple directory such as c:\temp and see what happens.I thought I did that. I had unzipped it to "C:\Documents and Settings\Michael\Desktop\Capture2Text_v2.4", but I guess something in that file path screwed it up. I just unzipped it again to "C:\temp", as you suggested, and now it works great. Guess I'll just keep it there and make a shortcut. Capture2Text - Japanese OCR Utility - streetsmartlang - 2013-01-28 There are a whole bunch of ways to handle Japanese OCR, both on Mac and Windows. The end of this post on my blog goes over a bunch of ways to do it (in the context of getting the text of lyrics for your Japanese songs), from some of the free ways mentioned here to the top-of-the-line $200 Adobe software. Capture2Text - Japanese OCR Utility - gorghurt - 2013-03-06 Sadly no Linux support. Has anybody tried this with Autohotkeyx and Wine (http://appdb.winehq.org/objectManager.php?sClass=version&iId=17738&iTestingId=44317) I didn't get Autohotkey to work at my first try, and had no time for more testing. Sadly the links to bombpersons skript are offline. Capture2Text - Japanese OCR Utility - cb4960 - 2013-07-05 I have just posted version 2.5 of Capture2Text. Download Capture2Text v2.5 via SourceForge (source code is included) What Changed? Just a minor update: ● Updated NHocr from v0.20 to v0.21. ● Now compiled with Ahk2Exe v1.1.11.01 instead of v1.1.05.06. cb4960 Capture2Text - Japanese OCR Utility - magicz123 - 2013-07-26 Hi cb4960, I really like your software, I just register to koohii.com to say "Thank you" and I think I have some idea that can improve Capture2Text OCR accuracy. Improve your ConvertImageFormat.exe that make it remove captured image background and make it smoother may improve OCR output accuracy. You may not know, some people use Potrace and MKBitmap to improve accuracy of Tesseract. Download from here: http://potrace.sourceforge.net/ They use MKBitmap to remove color background from image: http://potrace.sourceforge.net/mkbitmap.html See this picture they remove all other color except black(that is the color of almost text in image): http://potrace.sourceforge.net/img1/loxie-t3.png And then use Potrace to enhance image quality, make it smooth, that make Tesseract easily recognize text from picture and also improve accuracy of Tesseract output: http://potrace.sourceforge.net/samples.html You wrote Capture2Text in Autohotkey language, so you may have some idea from this script, it wrote in Autohotkey too, it uses NConvert to convert image to Bitmap format, and use MKBitmap to make image black and white only, remove background from image, and then uses Potrace to make image smoother: http://www.autohotkey.com/board/topic/10188-screengrab-ocr-text-gui-for-optionsresults/?p=431921 Hope you can get some idea how to make Capture2Text more accurate. ___________________________ And I have another method, that is use Textcleaner(a script of ImageMagick( http://www.imagemagick.org/script/index.php ) to remove picture background, keep only text: http://www.fmwconcepts.com/imagemagick/textcleaner/index.php And then use Tesseract to OCR that image, this improve output result, here is the result, even better than commercial OCR software: http://vbridge.co.uk/wp-content/uploads/2012/10/OCR-test1-1024x512.png More infomation about using Tesseract with Textcleaner from here: http://www.imagemagick.org/script/index.php Thank you, I hope I can help you one hand
Capture2Text - Japanese OCR Utility - cb4960 - 2013-07-27 @magicz123, Thanks for the info! I'll try to experiment with these tools in the coming weeks. Capture2Text - Japanese OCR Utility - cb4960 - 2013-08-27 I have just posted version 3.0 of Capture2Text. Download Capture2Text v3.0 via SourceForge (source code is included) What Changed? ● Added option to binarize captured image before sending it to the OCR engine. It is disabled by default. To enable, you can either hit Win+b, or check the box in Preferences -> Output. Binarization (aka Thresholding) is just the process of converting an image to 1 bpp (ie. black and white). However, it can DRAMATICALLY improve OCR accuracy for manga and other sources. Comparison showing the same capture both before and after binarization: ![]() When reading manga, it is usually best to leave binarization enabled. If you find a word that the OCR engine fails on, try using Win+b to toggle binarization OFF to see if that helps (but be sure to toggle it back ON afterwards). In my testing, the difference between it being enabled and disabled was like night and day. I should have added this ages ago. If you were unsatisfied with Capture2Text in the past, you might want to give it another shot with binarization enabled. (Note: As with previous releases, the default OCR language is English. Press Win+2 to switch the language to Japanese using the primary Tesseract OCR engine, or press Win+1 to switch the language to Japanese using the secondary Japanese NHocr OCR engine, or right-click the Capture2Text icon in the tray and select Japanese that way.) cb4960 Capture2Text - Japanese OCR Utility - magicz123 - 2013-08-28 Thank you for update
Capture2Text - Japanese OCR Utility - shan109 - 2013-12-09 Will there be a possibility for a linux port?? Capture2Text - Japanese OCR Utility - cb4960 - 2013-12-10 shan109 Wrote:Will there be a possibility for a linux port??I have no plans to port Capture2Text to linux. Capture2Text - Japanese OCR Utility - ryuudou - 2013-12-10 The new(er) functionality sounds awesome. Capture2Text - Japanese OCR Utility - cb4960 - 2014-03-01 I have just posted version 3.1 of Capture2Text. Download Capture2Text v3.1 via SourceForge (source code is included) What Changed? ● Improved Japanese OCR accuracy through use of better image pre-preprocessing and more finely tuned Tesseract configuration options. The previous version was only 70% accurate in my test suite. The new version is 90% accurate. ● Now supports text and backgrounds of any color when OCR pre-processing is enabled. In the previous version, only dark text on a light background was supported. ● Added option to place the preview text beside the capture box. See Preferences -> OCR -> Preview Box -> Location. (Note: As with previous releases, the default OCR language is English. Press Win+2 to switch the language to Japanese using the primary Tesseract OCR engine, or press Win+1 to switch the language to Japanese using the secondary Japanese NHocr OCR engine, or right-click the Capture2Text icon in the tray and select Japanese that way.) cb4960 Capture2Text - Japanese OCR Utility - znebr47625 - 2014-03-22 Just for reference, in case anybody wants to test there are 読んde ココ and SmartOCR Lite (couldn't find professional version). Those are unbelievably good OCR programs. You can OCR with few clicks one of those light novel that comes in jpeg entirely with 読んde ココ and It's very precise. Capture2Text - Japanese OCR Utility - cb4960 - 2014-03-22 arnaldosfjunior Wrote:Just for reference, in case anybody wants to test there are 読んde ココ and SmartOCR Lite (couldn't find professional version). Those are unbelievably good OCR programs. You can OCR with few clicks one of those light novel that comes in jpeg entirely with 読んde ココ and It's very precise.e.Typist is another good option. The cheaper NEO version supports Japanese and English. Capture2Text - Japanese OCR Utility - cb4960 - 2014-07-10 I have just posted version 3.4 of Capture2Text. Download Capture2Text v3.4 via SourceForge (source code is included) What Changed? ● Added option to strip furigana. It is enabled by default. To disable: "Preferences > OCR > Strip Furigana". ![]() The text direction preference affects how this feature operates. ● Added the "Auto" choice to the "Text direction" preference. It is enabled by default. It uses very simply logic: If the width is more than twice as long as the height, text direction is assumed to be horizontal, otherwise text direction is assumed to be vertical. As you can see, it is biased in favor of vertical text. ● Removed the "OCR pre-processing" hotkey option from the Preferences. By default is is now set to the awkward key combination Shift-Ctrl-Windows-B. It may still be edited in settings.ini. (Note: As with previous releases, the default OCR language is English. Press Win+2 to switch the language to Japanese using the primary Tesseract OCR engine, or press Win+1 to switch the language to Japanese using the secondary Japanese NHocr OCR engine, or right-click the Capture2Text icon in the tray and select Japanese that way.) cb4960 Capture2Text - Japanese OCR Utility - MarseSnorty - 2014-09-15 Is there any chance of getting a way to use this without a keyboard? Because I use a convertible tablet, so when I'm lounging around reading, I don't have they keyboard attached, and so can't really use ocr. Like you click the icon in the taskbar, and it puts up a transparent grey layer, you then click and drag over that layer to make a selection, making the area selected fully transparent (similar to how the snipping tool looks when taking snips) |