(2017-01-20, 7:55 pm)Zarxrax Wrote: eslang: here is Hibana sub images: https://www.mediafire.com/?p55sm9tgbp2bw9z
Today I was able to whip up my own homemade OCR using some APIs available in windows 10. It seems pretty decent but not as good as that google OCR. Definitely better than tesseract though. I'll investigate it further to see if this might be viable to batch convert the whole series, since you have to pay to run the google thing on more than 1000 images.
Hopefully, your home-brew OCR can whip up some decent stuff. Looking forward to see it later.
(2017-01-20, 8:33 pm)zx573 Wrote: I need to use my Google Cloud Platform account for work-related test development for now (writing an OCR-based program which is why I started learning the API to begin with) so I don't really want to run a bunch of subtitles through it, but if anyone wants to make their own account and get an API key to plug into the script I posted, it's easy to sign up for a free trial. When you sign up for the trial it gives you $300 free Google Cloud Platform credit that's valid for 2 months. If it's nearing the date my money expires and I still have money left over then I'd be willing to run some OCR for you guys. I managed to burn through $5.23 mostly from testing the OCR app already.
Thanks for the information on Google Cloud Platform credit system.
Oh wow, just two test files and it already burn through $5.23! 高いですね。
So far, johndoe2015 have requested the first two episodes of Terrace House. But let us wait a little while for 'juniperpansy' reply.
It seems that your updated program is getting better at OCR recognition. すごい！
Here is the simple breakdown for the Random Tokyo Stories.
TOTAL LINES = 323
TOTAL ERRORS = 26 (8.05%)
Minor Errors = 2 (0.62%)
Major Errors = 18 (5.57%)
Critical Errors = 6 (1.86%)
<2>ERRORS : person name in kanji
[2 Minor Errors]
<16>ERRORS : big つ ツ big え
<1>ERRORS : big や
<1>ERRORS : missing ー
[18 Major Errors]
<6>ERRORS : wrong or missing word
[6 Critical Errors]
"False Syllables" (字余り；ダブリ字) identified in 19 Lines (5.88%) are not counted as errors.
（Random episode of Tokyo Stories）
Proofread & Edited:
Edited: 2017-01-20, 8:51 pm