Back

Convert kindle books to txt

#1
What is the best way to convert a kindle ebook from mobi to txt? I want to do this so I can use cb's japanese analysis tool.
Reply
#2
You can try http://www.zamzar.com/ for quick conversions.
Reply
#3
If you use calibre to convert it from mobi, epub, etc. to zip, you will find the chapters as HTML documents (or all in a single document). If the book had furigana, that would be the better option, because I assume they would otherwise be either ignored or simply placed next to their respecitve kanji, making the book slightly less readable.
Edited: 2015-08-24, 7:04 pm
Reply
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions! - Sign up here
JapanesePod101
#4
You can use calibre (+ DRM removal addon) to convert from between the az3, mobi, epub, txt, etc. formats. Calibre also allowed you to specify regex to remove the aozora formatted furigana.

More on this here: http://forum.koohii.com/showthread.php?tid=12739
Reply
#5
Aww thank you! It seems that Calibre is the best option!
If I use the drm removal I will be able to share the book with friends? (Just saying... lol)
Reply
#6
You'll be able to share the book, but I wouldn't necessarily do it. There may be other anti-piracy features in the files. For example, Pottermore sells their ebooks without DRM, but they put data in the book that can identify you - they encode it in images etc. So if someone leaks their ebooks online, they'll know who did it, and can in theory take action against that person.
The surest way to remove any such data is to convert the book to plaintext. Unfortunately that will also remove all formatting, pictures, and possibly chapter titles (depending on how you do the conversion). In case of Japanese, you'll also need to take steps to ensure furigana doesn't just get mixed into the text (Calibre can do this) Personally I do this with my ebooks, because plaintext is best suited for the learning tools I use.
Another option is to convert the ebook into some open ebook format, such as epub. Anything other than plaintext is going to have metadata, but this metadata will likely be incompatible with Amazon's format, and the anti-piracy metadata will likely get lost in the process. Except for if they put it in pictures. I recommend removing pictures.
Reply
#7
This is why I stopped buying kindle books. I can only read them on my computer (unless I somehow convert them to another format).

I guess kindle books are cheaper than physical books because you're more limited in what you can do with them (e.g., photocopy pages, lend them to people).

I looked into converting kindle books to pdf files. You either have to download and install software on your computer (and who knows what that does to your computer) or upload your kindle files to a website (which promises to not store and keep the kindle book you uploaded yeah right).
Reply
#8
john555 Wrote:You either have to download and install software on your computer (and who knows what that does to your computer).
Yep that crazy newfangled dangerous computer software

EDIT: Just to post something that might actually be useful for the thread

Green_Airplane Wrote:You'll be able to share the book, but I wouldn't necessarily do it. There may be other anti-piracy features in the files.
Indeed, I believe there is metadata contained in the kindle books that will identify the purchaser. The DRM removal addon for calibre does not remove this metadata. Of course, as Green_Airplane said you can simply just convert to plaintext and guarantee safety. I also use plaintext for my own learning tools.
Edited: 2015-09-03, 10:23 am
Reply
#9
I'm mostly against sharing copyrighted material so I think I will not do it, it was more just in case I want to send it "privately" to a couple of friends which don't have a kindle.

I managed to convert it both in epub (with furigana) and txt format (without furigana), thank you guys!
Reply
#10
Flamerokz Wrote:
john555 Wrote:You either have to download and install software on your computer (and who knows what that does to your computer).
Yep that crazy newfangled dangerous computer software
A lot of software is buggy and if you install it on your computer your computer is slowed down and keeps crashing until you do a system restore back to a restore point from before you installed it. It's happened to me. Maybe you've just been lucky.

Flamerokz Wrote:EDIT: Just to post something that might actually be useful for the thread

Green_Airplane Wrote:You'll be able to share the book, but I wouldn't necessarily do it. There may be other anti-piracy features in the files.
Indeed, I believe there is metadata contained in the kindle books that will identify the purchaser. The DRM removal addon for calibre does not remove this metadata. Of course, as Green_Airplane said you can simply just convert to plaintext and guarantee safety. I also use plaintext for my own learning tools.
This is easy to fix. Print out your illegal copy, then scan it as a pdf. Then no more metadata.
Reply
#11
john555 Wrote:This is easy to fix. Print out your illegal copy, then scan it as a pdf. Then no more metadata.
But then it's not text anymore.

Op Wrote:I want to do this so I can use cb's japanese analysis tool.
Reply
#12
john555 Wrote:This is easy to fix. Print out your illegal copy, then scan it as a pdf. Then no more metadata.
... Well that's certainly one way to go about it.
Reply
#13
john555 Wrote:
Flamerokz Wrote:
john555 Wrote:You either have to download and install software on your computer (and who knows what that does to your computer).
Yep that crazy newfangled dangerous computer software
A lot of software is buggy and if you install it on your computer your computer is slowed down and keeps crashing until you do a system restore back to a restore point from before you installed it. It's happened to me. Maybe you've just been lucky.

Flamerokz Wrote:EDIT: Just to post something that might actually be useful for the thread

Green_Airplane Wrote:You'll be able to share the book, but I wouldn't necessarily do it. There may be other anti-piracy features in the files.
Indeed, I believe there is metadata contained in the kindle books that will identify the purchaser. The DRM removal addon for calibre does not remove this metadata. Of course, as Green_Airplane said you can simply just convert to plaintext and guarantee safety. I also use plaintext for my own learning tools.
This is easy to fix. Print out your illegal copy, then scan it as a pdf. Then no more metadata.
John555 I understand what you're saying, but Calibre does not give those issues so I suggest you to try it Tongue

There is no need to scan it, because as someone already said, it's enough to convert it to plain text. But it is a good solution for pics (maybe in some books the images are needed in order to better understand the plot... things like maps)
Reply
#14
john555 Wrote:A lot of software is buggy and if you install it on your computer your computer is slowed down and keeps crashing until you do a system restore back to a restore point from before you installed it. It's happened to me. Maybe you've just been lucky.
If you indiscriminately install whatever software you find on the web it may happen, but if you stick to high-quality open-source software such as Calibre, downloaded from reputable sources, you will more likely never have any problem.
Reply
#15
gdaxeman Wrote:
john555 Wrote:A lot of software is buggy and if you install it on your computer your computer is slowed down and keeps crashing until you do a system restore back to a restore point from before you installed it. It's happened to me. Maybe you've just been lucky.
If you indiscriminately install whatever software you find on the web it may happen, but if you stick to high-quality open-source software such as Calibre, downloaded from reputable sources, you will more likely never have any problem.
"I swear picture.jpg.exe was totally legit at the time!"
Edited: 2015-09-04, 3:19 pm
Reply
#16
yogert909 Wrote:
john555 Wrote:This is easy to fix. Print out your illegal copy, then scan it as a pdf. Then no more metadata.
But then it's not text anymore.

Op Wrote:I want to do this so I can use cb's japanese analysis tool.
The full Adobe program has an OCR tool that makes your pdf file searchable. So maybe that would allow the Japanese analysis tool to work.
Reply
#17
john555 Wrote:The full Adobe program has an OCR tool that makes your pdf file searchable. So maybe that would allow the Japanese analysis tool to work.
Adobe Acrobat is very expensive (US$ 449.00, or a montly subscription) and the OCR for Japanese is not that great, it makes too many mistakes – not to mention that Adobe apps are sometimes known for slowing down your PC for one reason or another. There are other better and less expensive OCR apps but they still perform much worse than directly converting the mobi file to text.
Reply
#18
gdaxeman Wrote:Adobe apps are sometimes known for slowing down your PC for one reason or another.
If you have this problem on a mac, the adobe apps might be hogging memory and not giving it back. Try running sudo purge in the terminal and/or lower the memory usage for adobe apps in the preferences.

and yea, printing something only to scan and ocr it, you're going to get a lot of mistakes even with the best OCR software.
Reply
#19
yogert909 Wrote:
gdaxeman Wrote:Adobe apps are sometimes known for slowing down your PC for one reason or another.
If you have this problem on a mac, the adobe apps might be hogging memory and not giving it back. Try running sudo purge in the terminal and/or lower the memory usage for adobe apps in the preferences.

and yea, printing something only to scan and ocr it, you're going to get a lot of mistakes even with the best OCR software.
That's true. In the past I've tried photocopying pages of a Japanese book, then scanning and OCR'ing the pages so that I can get the kanji into a spreadsheet. There's always tons of errors.
Reply
#20
john555 Wrote:I've tried photocopying pages of a Japanese book, then scanning and OCR'ing the pages so that I can get the kanji into a spreadsheet. There's always tons of errors.
The spine is what causes a lot of those errors. I cut the spine off on my assimil books and ocr'd them forever ago and didn't have too many problems. I don't remember what software I used.

Like others have already said, that seems like a bit too much work for the intended purpose though.
Edited: 2015-09-05, 7:59 am
Reply