Back

Capture2Text - Japanese OCR Utility

#26
cb4960 Wrote:For now, you can use a program that detects the clipboard like StarDict or EBwin
wow. I didn't even realize that EBWin could do this. Thanks!

Is this not the solution that people have been looking for? A good quality J/E and J/J automatic dictionary lookup that can be used in non-browser applications? (I couldn't get StarDict to work well)

And with capture2text, I can presumably copy bits of text from some pdf's that otherwise won't let me. These will be very helpful indeed. *deep bow*
Reply
#27
Thora Wrote:
cb4960 Wrote:For now, you can use a program that detects the clipboard like StarDict or EBwin
wow. I didn't even realize that EBWin could do this. Thanks!

Is this not the solution that people have been looking for? A good quality J/E and J/J automatic dictionary lookup that can be used in non-browser applications? (I couldn't get StarDict to work well)
Hmmph. Don't diss Stardict! Stardict is what people have been overlooking! Stardict to the death!!1`~
Reply
#28
Unfortunately stardict doesn't automatically deconjugate stuff...
Reply
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions! - Sign up here
JapanesePod101
#29
FooSoft Wrote:Unfortunately stardict doesn't automatically deconjugate stuff...
It does for me.

http://forum.koohii.com/showthread.php?p...#pid118263
Reply
#30
I have just posted version 1.04 of Capture2Text.

Download Capture2Text v1.04 via MediaFire (source code is included)

- Added ability to move the capture box by right-clicking and dragging the mouse.

- Added all languages supported by the Tesseract OCR tool: Bulgarian, Catalan, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovakian, Slovenian, Spanish, Swedish, Tagalog, Turkish, Ukrainian, and Vietnamese.

By default only Chinese, English, French, German, Japanese, and Spanish are installed. Download any others you may need from the Tesseract site.

Two Japanese dictionaries are included. One uses NHocr and the other uses Tesseract. They both have their advantages, however, the Tesseract version won't work sometimes with small selections (one to five characters).

- Created a right-click menu that allow the user to select language, output type, capture box settings, and scale factor. This is a bit easier then editing settings.ini.

- Removed unnecessary items from settings.ini
Edited: 2010-12-04, 5:47 pm
Reply
#31
I have just posted version 1.05 of Capture2Text.

Download Capture2Text v1.05 via MediaFire (source code is included)

- Fixed issue where the checkmarks in the language menu wouldn't disappear.
Edited: 2010-12-04, 5:46 pm
Reply
#32
Mmhh.. I don't know which ocr software is better.. It seems that nhocr gets some things that tesseract doesn't get, but tesseract gets some things that nhocr doesn't get...

Anyways, I redid my python script I posted earlier to use tesseract if anyone wants it: http://pastebay.com/111303
Reply
#33
I have just posted version 1.06 of Capture2Text.

Download Capture2Text v1.06 via MediaFire (source code is included)

- Added OCR language quick access keys. This feature allows to to quickly switch between 3
languages. Here are the default keys and languages:
Windows Key + 1: Switch to Japanese (NHocr)
Windows Key + 2: Switch to Japanese - Alternate (Tesseract)
Windows Key + 3: Switch to English

- For Chinese and Japanese delete newlines (existing behavior). For other languages
replace newlines with spaces.


Note: For some reason, if there is a period "." in the path to the Capture2Text directory, the .exe might pop up with an error message. This only happens with the .exe and not the .ahk script.
Reply
#34
Hi cb4960,

I'm quite unfamiliar with image recognition software, so I have a question: is it possible to take a page of manga and ocr it without selecting which parts are text? thanks!
Reply
#35
zodiac Wrote:Hi cb4960,

I'm quite unfamiliar with image recognition software, so I have a question: is it possible to take a page of manga and ocr it without selecting which parts are text? thanks!
I haven't really looked into this too much, but OCRopus looks promising as a back end for the task that you've described.

OCRopus Wikipedia entry.

According to this article, Google uses/maintains it for its book search service.

Edit: But to answer your question, I don't know of any free tool that provides a nice front end to do this.
Edited: 2010-12-13, 3:41 pm
Reply
#36
Hi, I know this post is from 6 months ago, but I''l give it a try anyway.

Thanks for posting this thread, however for me it says that the jpn file for the language is corrupted and the application doesn't work. It would be really nice of you to explain me what's wrong and how I can fix it! Smile

I would use this to copy and pass Kanji on Microsoft Word. Smile Thanks
Reply
#37
Indochine Wrote:Hi, I know this post is from 6 months ago, but I''l give it a try anyway.

Thanks for posting this thread, however for me it says that the jpn file for the language is corrupted and the application doesn't work. It would be really nice of you to explain me what's wrong and how I can fix it! Smile

I would use this to copy and pass Kanji on Microsoft Word. Smile Thanks
Try putting the the extracted files in a very simple directory path, such as "C:\Temp\Capture2Text" (so that the executable would be at "C:\Temp\Capture2Text\Capture2Text.exe"). Now try running Capture2Text.exe. I know it sounds like that shouldn't work, but the tool that converts AutoHotKey scripts to .exe files has some issues in this regard.

Or you could try running the AutoHotKey scripts directly. To do this:
1) Download AutoHotKey_L from http://www.autohotkey.com/download/
2) Install AutoHotKey (choose the unicode option when it comes up)
3) Move the 2 .ahk files (Capture2Text.ahk and ScreenCapture.ahk) from the "SourceCode\Capture2TextAhkScript" directory to the same directory that contains the .exe file.
4) Double-click Capture2Text.ahk.

If those 2 procedures don't work, maybe you can give me a more exact error message. Does it happen as soon as you start the program? Does it work with the other dictionaries? Also, what version of Windows are you using?

Edit: Also, no worries about posting in any old thread that I have created.
Edited: 2011-06-15, 9:57 pm
Reply
#38
None of them worked... In fact when I'm extracting the files right after downloading, there is an error message:
"C:\Users\Downloads\Capture2Text_v106.zip:failure CRC in Capture2Text\Utils\tessract\tessdata\jpn.traineddata. The file is corrupted"

I've already downloaded a file with a working file jpn.traineddata, but when I tried to fix the corrupted one with the working one, nothing happens.
Reply
#39
Please forgive how rude this may come off (it's not supposed to be) but whats the use of this? If it's a small chunk of text then isn't it quicker to type it yourself? About the only use I can think of is where you don't know the reading for a character and you want to copy it to an online dictionary? (which admittedly, would be very useful)

Maybe someone here could enlighten me?
Reply
#40
whats a recommended comic viewer?
Reply
#41
Tolerence91 Wrote:whats a recommended comic viewer?
I'd recommend ComicRack: comicrack.cyolito.com
Reply
#42
Tolerence91 Wrote:whats a recommended comic viewer?
Mangameeya if you don't mind it being in Japanese.
Reply
#43
squarezebra Wrote:Please forgive how rude this may come off (it's not supposed to be) but whats the use of this? If it's a small chunk of text then isn't it quicker to type it yourself? About the only use I can think of is where you don't know the reading for a character and you want to copy it to an online dictionary? (which admittedly, would be very useful)

Maybe someone here could enlighten me?
Well, I think that's pretty much the purpose of it. Unless you are already fluent to the point where you don't need to do dictionary lookups, then you probably aren't going to know how to type something just by looking at the kanji.
Reply
#44
Indochine Wrote:None of them worked... In fact when I'm extracting the files right after downloading, there is an error message:
"C:\Users\Downloads\Capture2Text_v106.zip:failure CRC in Capture2Text\Utils\tessract\tessdata\jpn.traineddata. The file is corrupted"

I've already downloaded a file with a working file jpn.traineddata, but when I tried to fix the corrupted one with the working one, nothing happens.
I uploaded it to Media Fire again:
http://www.mediafire.com/?atijxta5ahj5odq

I have been able to download it and successfully unzip it with both 7-zip and WinRAR.
Edited: 2011-06-16, 8:45 pm
Reply
#45
Bokusenou Wrote:
Tolerence91 Wrote:whats a recommended comic viewer?
Mangameeya if you don't mind it being in Japanese.
Here is a version in English:

http://www.mydailymanga.com/2009/01/14/m...ya-update/

or this one (supposedly aimed at Vista/Win7 users):

http://www.mydailymanga.com/2011/01/17/m...a-7-users/
Edited: 2011-06-16, 8:52 pm
Reply
#46
cb4960 Wrote:
Indochine Wrote:None of them worked... In fact when I'm extracting the files right after downloading, there is an error message:
"C:\Users\Downloads\Capture2Text_v106.zip:failure CRC in Capture2Text\Utils\tessract\tessdata\jpn.traineddata. The file is corrupted"

I've already downloaded a file with a working file jpn.traineddata, but when I tried to fix the corrupted one with the working one, nothing happens.
I uploaded it to Media Fire again:
http://www.mediafire.com/?atijxta5ahj5odq

I have been able to download it and successfully unzip it with both 7-zip and WinRAR.
Thanks I can unzip it succefully now! Smile
Thanks a lot!
Edited: 2011-06-17, 7:42 am
Reply
#47
I tried them all. thanks guys. the one I really like is the comicrack one Oniichan suggested. just cant save any bookmarks so im upset at the moment that noone at their forums is responding to my problem =/
Reply
#48
Hello,

I'm trying to run Capture2Text, but I have problems with both the EXE and the AHK files. If I try to run the EXE, I get the following error:

Quote:Windows cannot access the specified device, path, or file. You may no have appropriate permissions to access the item.
If I try to run the AHK file, I get this error:

Quote:Error at line 34.
Line Text: FileEncoding, UTF-8
Error: This line does not contain a recognized action.
Help?

Thank you!
Reply
#49
ojousan Wrote:Hello,
If I try to run the AHK file, I get this error:

Quote:Error at line 34.
Line Text: FileEncoding, UTF-8
Error: This line does not contain a recognized action.
Help?

Thank you!
"FileEncoding" is a command that is only found in AutoHotKey_L. Make sure you didn't download the other non-L version of AutoHotKey by mistake.
Reply
#50
Thank you, that was my problem! Smile
Reply