RECENT TOPICS » View all
Hi folks,
I recently purchased a Kindle Paperwhite and needed a way to easily import unknown words I find in the books I read without having to interrupt my reading and write the word down on paper or on my computer.
The Kindle has a feature that lets you highlight sections of the text (or in this case, words) you're reading and adds the highlight to a text file located on the Kindle in the documents folder (documents\My Clippings.txt). This text file keeps track of which book the highlights were made in, at what approximate location in the text, and what was highlighted.
I created a tool that is meant to work alongside Rikaisama to allow you to easily import into Anki words that you highlighted, along with the sentences you read them in. The only requirement is having a text or HTML copy of the book (i.e the tool cannot read .MOBI files directly). If you do not have a text/HTML copy, you can probably manage to extract an HTML copy of the book using tools such as KindleUnpack.
The tool generates a temporary HTML file containing the words + sentences and opens them up in your default browser (which is probably Firefox if you are using Rikaisama often). The produced HTML file looks like this:
The round icon on the left side is a link to Google.co.jp with results filtered on Japanese language pages - I use this to check if a word is popular or not before adding it (i.e if a word only comes up with 100k results, I rarely add it to anki, but tend to add anything that returns 500k and above). At this point, all you have to do is mouse-over the red words (which correspond to the text you highlighted) with Rikaisama turned on, press S, and you're done. Note that the sentence-finding is approximate - the program might end up finding the word in another sentence than the one you highlighted it in, but most of the time, it should be alright (especially with uncommon words).
The program itself looks like this:
When you open it up, it tries to detect your Kindle if it is plugged in. If it can't, you will have to manually select a "My Clippings.txt"-format file in the first field.
-The "Book Name" field is a list of different books for which you've added highlights on your Kindle - you need to select the book you are interested in importing into Anki.
-The "Book File" field is the text/HTML version of the book you were reading on your Kindle.
-The "Book File Encoding" field defines whether the text/HTML file is written in Shift-JIS encoding or UTF-8. Autodetect should work for most cases, so there is no need to worry about this particular field.
Once you've selected everything, you can simply hit the "Generate" button and the generated HTML file will pop up in your browser, and you can easily import with Rikaisama from there.
If anybody is worried/interested, the code (C#) is here: http://pastebin.com/1fgQwJ3L
It's a something I put together last evening in a couple of hours, so don't expect the code to be all that clean.
And lastly, the program itself is here (it requires .NET 4.0 to be installed on the computer): http://dl.dropbox.com/u/44590151/ClippingsToAnki.zip
Hopefully someone finds this useful!
Thanks for writing this. I edited your code for my purposes, though, since I use Epwing2Anki for adding vocab, not Rikaisama. =/
Now I just need to read more, ahahaa~
Hi folks,
This discussion is of prime interest to me. I'm the recent owner of a Kindle Paperwhite, which enables me to create vocabulary cards (http://g-ec2.images-amazon.com/images/G … /showme...) on my Paperwhite when highlighting a specfic word during my readings.
My question is: how can I import my Paperwhite vocabulary cards into Anki?
That way, I could:
1 - adjust the "answer" and not depend on only the definition automatically provided by the default Paperwhite dictionary.
2 - keep on benefiting from the excellent SRS from Anki.
If I'm talking science fiction here, what other option do I have, except manually recreating my hundreds of Paperwhite cards into Anki, which is obvisouly not the ideal flow. I'm trying to establish an automatic flow back & forth from my Paperwhite to Anki.
(Important precision: I'm in no way a developer or programmer, and the details given above are unfortunately of little help to me.)
Thanks to the great community.
- Open /system/vocabulary/vocab.db with any SQLite software
- Export as csv file
- (modify fields if required)
- Import in Anki
Last edited by comeauch (2013 December 29, 10:36 am)
Thanks Comeauch, but there is no vocab.db on my Kindle Paperwhite (version 2). I have looked in each folder and subfolder, and I have done automatic searches, but there is non.
For each book, I have a dedicated folder with files having the following extensions: BMP, MBP1 and MBS.
Regarding the "My Clippings.txt", it contains the PHRASES I highlighted during my readigns, but not the SINGLE WORDS. The single word are the ones beiong automatically used for/by the Kindle Vocabulary Builder. I have thousands of those words but don't know where they are saved on the Kindle Paperwhite, in what file. I can find no trace of any vocab.db file.
Again, I'm not a programmer.
I should find out what a SQLite software is and how to use it, but my question remains: where are my vocabulary cards on the Kindle Paperwhite?
Thanks for your help.
Last edited by YogaSpirit (2013 December 29, 11:40 am)
I also have a 2nd gen paperwhite and sure enough, all the single words I looked up by selecting them are in this vocab.db file (those are the ones I also see as flashcards on the paperwhite). Connect your kindle to your computer and assuming you're on Windows, go to My Computer -> Kindle -> system -> vocabulary and there should be a vocab.db file (maybe you have to show hidden files/folders?). If it isn't there, then sorry! I don't know! XD Maybe this file is something Calibre does? Are you using Calibre (an ebook management software)?
However, it seems that the definition isn't included as text, rather it directs to a location in the dictionary file. It's still useful though. For example, I looked "腕組" on my kindle and in this appears (among a few others fields) in this vocab.db file:
id: ja:腕組
word: 腕組
stem: うで‐ぐみ
lang: ja
category: 0
timestamp: 1387947332934
book_key: CR!DVWG5XMNWX26S4SX3RQ50NJ7W9RJ:BCACB6B7 (how the Kindle refers to the book this word was from)
dict_key: B00771M8JQ
pos: 7172 (<- sadly, this would be your definition at position "7172" in the dictionary.... not very useful!)
usage: これから百年の間こうして待っているんだなと考えながら、腕組をして、丸い墓石を眺めていた。 (the sentence the word was from)
You can just delete all the info you don't want and add definitions on your own... Not super efficient, it might be faster to do everything manually, but the "usage" field is nice.
Thanks again. My issue was that I had the Kindle system folders and files hidden.
Tomorrow or when I'll have time, I'll need to download and install an SQLite soft so as to open the vocab.db.
Indeed, the context field is nice and I will have to create my own definition manually.
One other question I now have is how do you manage in order to know what word/vocabulary flashcard you have already imported into Anki and what are the new ones to be treated? I mean, since the flow is manual... I guess I could write a routine for that if I were a programmer, but since I'm not, what would you advise for me to know what was the last word treated?
Thanks
Glad you found it! ![]()
I'm not an expert at all either with those things, but I'm pretty sure every sqlite software (I'm using Sqliteman btw, on linux) allows you to sort a table using one field. In our case, that would be this "timestamp" field. On SQliteman, I choose the "words" table and a table shows up listing every word, id, language, timestamp etc. I double-click timestamp and they get sorted by time added.
Alternatively, you could always export the whole thing in csv and then sort it by timestamp using Excel.

