kanji koohii FORUM
Copying Japanese text from pdf-files? - Printable Version

+- kanji koohii FORUM (http://forum.koohii.com)
+-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html)
+--- Forum: Learning resources (http://forum.koohii.com/forum-9.html)
+--- Thread: Copying Japanese text from pdf-files? (/thread-4464.html)



Copying Japanese text from pdf-files? - chochajin - 2009-11-22

Hi,

just another quick question (sorry to bother you all today).
I was wondering if there's a possibility that I can also copy Japanese text in pdf-files and paste them into anki as well? So far I only can copy and paste alphabet text.
If I could that would save me quite some time.


Copying Japanese text from pdf-files? - zazen666 - 2009-11-22

yes-use Gmail, email it to your self and open it as a html


Copying Japanese text from pdf-files? - Jarvik7 - 2009-11-22

That only works with pdf files that already contain the data as text. If you can't copy it in a pdf reader you can't in gmail either. The gmail trick is for circumventing drm, not to apply ocr.


Copying Japanese text from pdf-files? - chochajin - 2009-11-22

The file is over 70MB big, so sending per mail is not really an option anyway.

Any other option?


Copying Japanese text from pdf-files? - ahibba - 2009-11-22

70MB = scanned book

You can copy from it using OCR only.


Copying Japanese text from pdf-files? - bebio - 2009-11-22

ABBYY Finereader, with support for Japanese language.

However, depending on the quality of the PDF, the results may vary tremendously.
And it does not circumvent DRM.


Copying Japanese text from pdf-files? - chochajin - 2009-11-25

Thanks for the comments.
Excuse my stupid question, but what is/are "OCR" and "DRM"?


Copying Japanese text from pdf-files? - LaLoche - 2009-11-25

OCR Optical Character Recognition software can look at an image and turn the images of the letters in that image into text that you can manipulate. So it can look at a pdf file or a scan of a page of a book and turn it into something that you can use in Word, for example.

DRM Digital Rights Management software protects the content of the file so that it cannot be copied or manipulated.

Hope that helps. I'm not a computer person, but that's my understanding. Some other folks here can explain it more clearly, I'm sure.