kanji koohii FORUM
KanjiTomo - New OCR program for Japanese text - Printable Version

+- kanji koohii FORUM (http://forum.koohii.com)
+-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html)
+--- Forum: Learning resources (http://forum.koohii.com/forum-9.html)
+--- Thread: KanjiTomo - New OCR program for Japanese text (/thread-9971.html)

Pages: 1 2 3 4 5 6


KanjiTomo - New OCR program for Japanese text - jessem - 2014-11-18

Just downloaded this and it's pretty awesome - this should really help with my computer reading, thank you so much!

The kana recognition is pretty bad...like surprisingly bad, considering how awesome the kanji recognition is. I know this stuff is really complicated to do. What an odd side effect...

So, thank you very much for making and sharing this!


KanjiTomo - New OCR program for Japanese text - Kurotowa - 2014-11-19

jessem Wrote:The kana recognition is pretty bad...like surprisingly bad, considering how awesome the kanji recognition is. I know this stuff is really complicated to do. What an odd side effect...
There are two reasons why kana recognition is worse than kanji:

1. I'm assuming that the average user of KanjiTomo already knows kana, so I have not made that much effor for optimizing kana recognition

2. There is more variability in fonts and writing styles for kana than kanji; this makes it more difficult to train the program to recognize kana


KanjiTomo - New OCR program for Japanese text - aldebrn - 2014-11-19

Kurotowa Wrote:1. I'm assuming that the average user of KanjiTomo already knows kana, so I have not made that much effor for optimizing kana recognition

2. There is more variability in fonts and writing styles for kana than kanji; this makes it more difficult to train the program to recognize kana
This is very interesting, thanks for explaining! I'm still ironing out my workflow but I use KanjiTomo not so much for reading as preparing to read, i.e., compiling sentences and MeCabing them, making lists of kanji to learn, making flashcards out of sentences & vocab & kanji, etc. So more to support intensive reading than extensive, which may have been its design. Is there's any kana recognition low-hanging fruit that I could help out by looking at?


KanjiTomo - New OCR program for Japanese text - husui - 2014-11-28

I just want you to know that, two years later, KanjiTomo is still extremely useful for college students like me. If I'm reading a passage with difficult Kanji, I scan the page and use KanjiTomo to make it much more manageable. Really underrated, and deserves more coverage. I wonder why there aren't any other programs with immediate Kanji recognition via mouse input.


KanjiTomo - New OCR program for Japanese text - Murtadha - 2014-12-12

hello
man this app is so good
but i tried to increase the maximum number of characters but it remain 4 why is that??
is there no other way?


KanjiTomo - New OCR program for Japanese text - Kurotowa - 2014-12-13

Currently it's not possible to increase the number of detected characters beyond 4.

There are two reasons for this:
- most Japanese words written in kanji can be identified with four characters
- characters are identified in parallel and since dual and quad core CPUConfused are most common, four is the optimal number for most computers

Murtadha Wrote:hello
man this app is so good
but i tried to increase the maximum number of characters but it remain 4 why is that??
is there no other way?



KanjiTomo - New OCR program for Japanese text - Murtadha - 2014-12-13

will i kinda of understand what you said but there are others don't apply for what you said
at least you should make it so that the user can change it while setting 4 the main


KanjiTomo - New OCR program for Japanese text - Tamba - 2015-01-06

Is there a way to stop KanjiTomo from stealing the focus whenever it identifies something?


KanjiTomo - New OCR program for Japanese text - Kurotowa - 2015-01-07

KanjiTomo should not steal focus (at least on Windows 7), but it does keep the window on top of other windows by default. Is this what you mean?

You could try turning off Automatic OCR mode (from checkbox), this prevents the program from staying on top. But then you need to use hotkeys to identify characters.

Tamba Wrote:Is there a way to stop KanjiTomo from stealing the focus whenever it identifies something?



KanjiTomo - New OCR program for Japanese text - Cyborg Ninja - 2015-01-07

I don't know if this has been considered already since I only read pages 1 and 5...

Perhaps the program can calculate the average box size or mean, draw that on the screen, disregarding punctuation and chiisai kana, and this would help with problems mentioned like on the first page with 今度. I know it's been a couple of years, so it's probably already been fine-tuned. The program also would need to recognize the way the text is displayed, whether vertical or horizontal. Maybe it will need to recognize the whitespace between lines. This sounds like a good idea and implementable, but I don't know myself how easily it will recognize little kana like ぅ or っ. I wonder. Great program, good job.


KanjiTomo - New OCR program for Japanese text - Tamba - 2015-01-07

I'm using Windows 7.
After experimenting around a bit, I think I've identified the problem.
When you have Kanjitomo on the same screen as whatever you're reading it works as expected(Windows stays on top, but i can still scroll using the mouse or keyboard), but if the focused window and Kanjitomo are on different screens you lose the focus whenever the automatic OCR identifies something.
It probably thinks it's no longer "on top" even though its perfectly visible on the other screen.


KanjiTomo - New OCR program for Japanese text - Kurotowa - 2015-01-12

Cyborg Ninja Wrote:Perhaps the program can calculate the average box size or mean, draw that on the screen, disregarding punctuation and chiisai kana, and this would help with problems mentioned like on the first page with 今度. I know it's been a couple of years, so it's probably already been fine-tuned. The program also would need to recognize the way the text is displayed, whether vertical or horizontal. Maybe it will need to recognize the whitespace between lines.
This is pretty much what the program is doing. There is no single algorithm for identifying character locations, just a set of rules like these that are applied. It works most of the time if there is enough contrast between text and background, but you can also mark characters by drawing the bounding box with mouse.

If automatic orientation is selected from settings menu, orientation is detected by comparing character distances in both directions.


KanjiTomo - New OCR program for Japanese text - Kurotowa - 2015-01-12

Tamba Wrote:I'm using Windows 7.
After experimenting around a bit, I think I've identified the problem.
When you have Kanjitomo on the same screen as whatever you're reading it works as expected(Windows stays on top, but i can still scroll using the mouse or keyboard), but if the focused window and Kanjitomo are on different screens you lose the focus whenever the automatic OCR identifies something.
It probably thinks it's no longer "on top" even though its perfectly visible on the other screen.
Thank you for reporting this issue. However, I have not been able to reproduce it. I can keep KanjiTomo on one screen and the target in other, but the focus is not being lost. I will try to fix this if possible but I would need more information to find out when the problem occurs.


KanjiTomo - New OCR program for Japanese text - Tamba - 2015-01-13

I just tried it on my work laptop (also Windows 7) and I can't reproduce it here either. Seems like its a problem with my setup at home. I'll investigate some more.

In the meantime: how about an option to disable the "stay on top" even when automatic OCR is active?


KanjiTomo - New OCR program for Japanese text - Kurotowa - 2015-02-08

Tamba Wrote:In the meantime: how about an option to disable the "stay on top" even when automatic OCR is active?
If you are still having this issue, download the program again. I have added an option to disable stay on top: set DISABLE_KEEP_ON_TOP=1 in config.txt


KanjiTomo - New OCR program for Japanese text - Tamba - 2015-02-08

Thanks for the effort, but that doesn't help either.

The only difference with the new option is:
When the option is set to 0 the window is always on top, and no window that's not also 'always on top' can get on top of it.
With the option set to 1, Kanjitomo can be in the background, but comes to the foreground when it recognizes a character.

In both cases (with the new option enabled or disabled), I can get the program into a state where it doesn't steal the focus by clicking the window I want to be active until I Kanjitomo stays back. When it's in that state, Kanjitomo's icon on the taskbar flashes a few times, then stays lit.
This lasts until I manually focus on Kanjitomo . It's probably a timing thing where I click the active window just when/after Kanjitomo gets the focus.

The problem isn't caused any specific application either. It happens with web-browsers, Explorer windows, games...
It's not even the two monitors-thing I suspected earlier. This also happens when Kanjitomo and the active application are on the same monitor.

edit: That means, the problem isn't actually the "always on top"-attribute of the window, it's whatever brings the window to the front when it recognizes a character.


KanjiTomo - New OCR program for Japanese text - Tenzoku - 2015-03-13

This is a great program, to be sure. Reading would be so much slower without it in cases where I can't use Rikaisama. Never had a problem with it on Windows or Linux aside from the random crashes. However, I'm having difficulties getting it to work on OS X. installed JDK but nothing happens when I try to run the .jar file. When I try to open it through the command line I get the message: "Error: Unable to access jarfile KanjiTomo.jar". Anyone have any ideas what that could be about?

Edit: Nevermind, I got it. For anyone who may have the same problem and stumble across this post in the future, I ran "java -Xmx1000m -jar KanjiTomo.jar -run" in a terminal from the KanjiTomo directory.


KanjiTomo - New OCR program for Japanese text - Kurotowa - 2015-04-12

I found one more line in the code where focus is grabbed. It's now disabled by DISABLE_KEEP_ON_TOP option; try dowloading the new version and see if it helps.

Tamba Wrote:edit: That means, the problem isn't actually the "always on top"-attribute of the window, it's whatever brings the window to the front when it recognizes a character.



KanjiTomo - New OCR program for Japanese text - Kurotowa - 2015-04-12

I have released a new version of KanjiTomo. There is one new feature: support for Chinese language. Chinese dictionary is included but the names dictionary is only for Japanese.

To enable Chinese support, CHINESE_DICTIONARY must be set to 1 in config.txt. There are also options for traditional or simplified characters and pinyin tone marks.

The new version can be downloaded from: http://www.kanjitomo.net


KanjiTomo - New OCR program for Japanese text - gdaxeman - 2015-04-12

Kurotowa Wrote:I have released a new version of KanjiTomo. There is one new feature: support for Chinese language. Chinese dictionary is included but the names dictionary is only for Japanese.
Amazing, that's great for reading some manhua in Chinese. In my initial tests, it seems to work really well.

About the names, I think there's not a dedicated dictionary for them in Chinese; variant readings for surnames are usually added together with other entries in regular dictionaries. Also, Chinese speakers usually read the kanji in Japanese names with their Chinese pronunciation, which 'mitigates' the problem (and makes them all unrecognizable...)


KanjiTomo - New OCR program for Japanese text - Tamba - 2015-04-16

Kurotowa Wrote:I found one more line in the code where focus is grabbed. It's now disabled by DISABLE_KEEP_ON_TOP option; try dowloading the new version and see if it helps.
That did it, KanjiTomo has stopped stealing the focus now.
Thanks for the change.


KanjiTomo - New OCR program for Japanese text - Stansfield123 - 2015-08-03

Kanjitomo.net is down. If anyone uploads this to some place else, I'll be sure to say thank you... (unless it's just a temporary problem and the site is up by tomorrow, in which case I'll be sure to say thanks for nothing).


KanjiTomo - New OCR program for Japanese text - kraemder - 2015-08-08

For any mac users out there that tried KanjiTomo but gave up I stumbled on a thread explaining how to make it work.

http://www.reddit.com/r/osx/comments/2yxq1f/some_questions_about_my_new_macbook_pro_mostly/

So happy I found that. Like the person asking the question, Kanjitomo was one of the few reasons I wanted to keep Windows kicking around. The other being subs2srs. I'm working on a way to get that going on my mac as well though.


KanjiTomo - New OCR program for Japanese text - stephenmac7 - 2015-08-10

Is there any chance you would release the source code for this?


KanjiTomo - New OCR program for Japanese text - Kurotowa - 2015-08-10

stephenmac7 Wrote:Is there any chance you would release the source code for this?
I might release the source code for KanjiTomo's core OCR algorithm at some point. First I would need to polish the code so that its easier to read and write some documentation, but it's certainly something I would like to do. But I can't promise any timeframe.