Back

Top Japanese kindle books sorted by difficulty

#1
It seems a lot of people here are struggling to find material at their level. Finding books which difficult enough to learn from but easy enough to not make you want to quit altogether is ... difficult.

I've built a small solution for this called Read Your Level: it will help you find books which match your difficulty level using an automatically generated easiness algorithm, as well as showing you the count of each Kanji in the first 6000 characters. It links directly to the Amazon page, where you can download a free sample of the book (you don't actually need a Kindle), see a description, reviews, etc.

The "easiness score" I've generated doesn't correspond to JLPT levels, but you should still find it useful, as it corresponds quite well to the difficulty of the text (as me and some Japanese friends have judged).

For the moment, you'll see books I've manually added from Amazon's top 100 (both paid and free). Let me know if there are any books you'd like to see in the list - as long as I can get a free sample on Amazon with > 6000 characters, I can add it!

Also, it is free, but if you find it useful please let me know!
Reply
#2
I think this is a great idea. Too many people go around reading books hopelessly above their level and wonder why reading isn't fun and why they don't do very much of it.

Though I have noticed that a lot of kindle books don't have previews - which seems absurd seeing they're already digitised.

Can you tell us how your algorithm works? On what basis does it judge difficultly?

None of these are on Kindle, I think - but I'd be interested to see what level they get judged as:
http://www.amazon.co.jp/exec/obidos/ASIN/4046310189
http://www.amazon.co.jp/exec/obidos/ASIN/4265075029/
http://www.amazon.co.jp/exec/obidos/ASIN/4904826930/
http://www.amazon.co.jp/exec/obidos/ASIN/4265075045
http://www.amazon.co.jp/exec/obidos/ASIN/4048739964/


edit: or actually, maybe I just misunderstand how these things work. Is the view-sample process for kindle different than for paper books? If so, I'm going to guess you can't run the ones I linked through your thing because of the formatting. So, oh well.
Edited: 2015-03-20, 8:11 am
Reply
#3
Very cool. Thanks alot for your work!
Reply
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions! - Sign up here
JapanesePod101
#4
Hi Aikynaro,

Thanks for the suggestions! 2 of those links have kindle books, you can see them by pressing "Kindle版". I've added the one with more than 6000 characters! Here it is:

http://readyourlevel.jamesknelson.com/bo...A%E3%81%84

Anything which is on Kindle has a sample, I'm using the samples to calculate difficulty.

As for the algorithm - it is pretty simple. It counts the occurrences of all Kanji across all the books in the database, then adds "difficulty points" to each book for each kanji. Frequent kanji get few points, infrequent Kanji get lots of points. And, if a single Kanji appears much more than average in a book, it'll assign fewer points, as you'll generally learn over time.

There are a lot of things I could do to improve the algorithm, but thought I'd throw out v1 to see if it is actually helpful for people before I spend too much time trying to improve it.
Reply
#5
I made this thread a while ago - it might be of interest to you. There's a couple of algorithms to give texts a difficulty rating out there - it would be interesting to see how yours compares to the ones developed in academia.

Going through my book lists to see if any others have kindle versions:
http://www.amazon.co.jp/303-3-ebook/dp/B009TPQVLY
http://www.amazon.co.jp/ebook/dp/B00IUAYHXI
http://www.amazon.co.jp/ebook/dp/B00IFTKTR2
http://www.amazon.co.jp/ebook/dp/B00IUAYBSE

Almost nothing does though...
Reply
#6
Thanks again Aikynaro! I added two of the books to the list:

ミナミノミナミノ
時をかける少女

But I couldn't add these two due to short samples:

霧のむこうのふしぎな町
きまぐれロボット

Also, thanks for the link to the thread about algorithms, it is an interesting read. I hadn't even considered that it would be something academia has looked at, so the link to Wikipedia's "Readability" page is interesting. I'll do some research and try and improve the algorithm this coming week - it still has a few weak points. In particular, it only works on frequency - so kanji which are not frequently used in novels (e.g. 虫) get rated as "hard" even if they're first grade.

I added a few books from the discussion in the linked thread as well:

きみにしか聞こえない
ボッコちゃん
ゼロの使い魔

In particular, I think the rating for ゼロの使い魔 can be improved. I'll post again once I've got had a shot at improving the algorithm
Reply
#7
I've read the first volume of ゼロの使い魔 and am about halfway through the second. They are great fun but would be pretty tough going without Rikai so your rating seems reasonable.

By far the easiest book I have so far encountered is 家事のニホヘト, despite the indecipherable title.

Here is the first paragraph:
とつぜんですが、ぞうきんどうしてますか?
手で縫ったぞうきんはぬいぐるみに対するような愛情が涌いてしまって 「使うのがもったいない」 という気持ちになってしまう私。
かといってスーパーでぞうきんを買うのもちょっと。
それよりなにより一番気になるのは、使ったあとのぞうきんの置き場所です。
目につくところには置きたくないけれど、しまい込むと使う時に億劫。
どうしよう、どうしたらいい?
。。。試行錯誤の末に行きついたのが 「着なくなったTシャツを適当な大きさに切ってぞうきんにする」 ことでした。
これを小さなかごに入れて汚れが気になったらさっと取り出し、水で濡らしてぎゅぎゅーっとしぼり、あらゆるところを拭くのです。
キッチンのコンロまわりの油はねはもちろん、冷蔵庫の取っ手、食器棚の中、椅子の足、竹のかご、ゴミ箱の中、窓の桟。。。汚れに気がついたら拭く。気がつかなくても拭く。とにかく拭く。拭き終わったら、ゴミ箱にポイと捨てて掃除はおしまい。
不要な服の処分ができ、家中がきれいになり、汚れたぞうきんを見ることもない、すてきな方法ではありませんか!

Easy as it is, even this contained one kanji (涌) and a word (億劫) I didn't know.
Reply
#8
Wow!!! Awesome! Do you pull down the unencrypted mobi files as previews to run the analysis on?

Can you add 魔城の宅急便?
Reply
#9
The title is a reference to the classic いろは poem, which used to be the way to alphabetically arrange the kanji. Since いろは also came to be used like "the ABCs" in English to mean "the basics", the title of that book essentially means the house duties beyond the basics (or the next step past the basics).

Don't light novels tend to have a lot of furigana? That would make the amount of kanji less important in the difficulty level than the grammar and vocabulary.
Edited: 2015-03-21, 1:35 pm
Reply
#10
yudantaiteki Wrote:The title is a reference to the classic いろは poem, which used to be the way to alphabetically arrange the kanji.
Thanks for that. I knew いろは meant ABCs but didn't know why until now.

Incidentally, a search turns up what is quite possibly the most misleading 'translation' of a sentence I've ever seen:

いろはにほへとちりぬるをわかよたれそつねならむうゐのおくやまけふこえてあさきゆめみしゑひもせすん
The quick brown fox jumps over the lazy dog.
Reply
#11
Sorry if it's state elsewhere but does this tool take in consideration which kanji have furigana?
Reply
#12
Thanks everyone for your suggestions! I'll add as many books as I can tomorrow - I've been flat out improving the site today. I didn't expect to get this amazing response!

I've updated the site today with a little bit of detail on how the algorithm works and it's strengths and weaknesses. I've also added a few featured books based on people's suggestions, both on the internet and in Japan.

cophnia61 Wrote:Sorry if it's state elsewhere but does this tool take in consideration which kanji have furigana?
Unfortunately not - it ignores all kana. This is something I'd like to add in the future, although I'm not exactly sure how to tell what is furigana and what isn't.
Reply
#13
zaydana

On this page, the 'our recommendation' link points to localhost Smile
Reply
#14
Thanks for pointing that out anotherjohn! I've just fixed it.
Reply
#15
This looks Awesome Zaydana. Thanks!

zaydana Wrote:For the moment, you'll see books I've manually added from Amazon's top 100 (both paid and free). Let me know if there are any books you'd like to see in the list - as long as I can get a free sample on Amazon with > 6000 characters,
You have a good reason for the 6000 character limit, but curious... have you compared the results for any books based on their full length? Many books use new vocabulary when new scenes are introduced and 6000 characters is probably only 15-20 pages of the whole book
Reply
#16
@anotherjohn, I had a look for 家事のニホヘト, but couldn't find an ebook so couldn't add it, sorry Sad

@aldebrn, I added 魔女の宅急便 - it came out as "hard", but a quick glance at the Kanji breakdown makes me think this is another one of those books where the algorithm could use a bit of improvement Smile I'll be working on making it better over the next week.

@juniperpansy, comparing some full length books with others that weren't full length caused problems when I gave it a shot. I think I understand the problem, and it should go away if I train the algorithm on a bunch of books which aren't in the actual listing. It is definitely something in the works.
Reply
#17
Amount of kanji isn't a good way to rate difficulty. From what I know texts can vary wildly in both kanji count and difficultly.
Reply
#18
ryuudou Wrote:Amount of kanji isn't a good way to rate difficulty. From what I know texts can vary wildly in both kanji count and difficultly.
I agree. Something that was calculated based off of frequency lists seems like it would be more accurate.

I'm pretty confident that こころ isn't easier to read than 魔女の宅急便。
Reply
#19
zaydana Wrote:@juniperpansy, comparing some full length books with others that weren't full length caused problems when I gave it a shot.
Yep totally. What I was trying to ask though is, say for example there is a book "Bob goes to Japan"

Run this book through your program twice
First run -> 6000 words only
Second run -> the full book

The second run will probably have more kanji. Let's say it has 30% more kanji. But if you do the same thing with another book the second run may have only 1% kanji than the first run. Obviously if this happened it would be an issue with the algorithm.

So basically just wanted to ask if you compared the variances of different books as described above... and if so, how did the variances look?
Reply
#20
That is a good point, juniperpansy. I'll purchase a few book with samples this week and run both the whole books and sample throughs to see what happens.

I agree with everyone that just basing this on kanji isn't ideal - ideally I'd like to count vocabulary as well. The problem is building a vocab frequency list turns out to be a lot harder than it first seemed due to the difficulty of splitting Japanese text into words (there are no spaces to help like English and it isn't always easy to de-conjugate to the dictionary form).

I'll get around to doing this at some point, but I think you'll see a decent improvement by using Wikipedia as a corpus instead of the books - this would remove some of the bias that the self help books generate, etc. This will also take waaaay less time to finish Smile Also planned for sometime this week is to try mixing the current algorithm with the Hayashi algorithm so it no longer just measures Kanji, but also measures sentence length.

I also want to add a method for feedback, so when dumb things happen like ここる being rated as easier than 魔女の宅急便, or Peter Rabbit being rated near 日本国憲法, I can use the feedback to improve the algorithm.

Again, thanks a lot for the feedback! Knowing what people want improved makes it a lot easier to decide on what to work on.
Edited: 2015-03-23, 8:25 pm
Reply
#21
Zaydana, would it at all help if I set up a MeCab server that parsed input via a REST API? I know it's not perfect, but I'm wondering if in your testing its good enough. Especially if you consider MeCab post-processing tools like Ve and j.DePP.
Reply
#22
Aldebrn, to be honest I'd never heard of MeCab before, or Ve, or j.DePP. I'm kinda new to this whole area. Reading the docs is going to take some effort for me, so can I ask what in particular you think it'd be useful for?
Reply
#23
They are all language/grammar processors so they can split up a sentence into words and grammar components. You could then use that to judge if a text was hard or easy based on the actual vocabulary used.
Reply
#24
Ok! I'm not sure about using it through a REST API, seems like it would be better to try and get it working locally. I won't be able to get it working immediately, because an easiness ranking based on vocab will need a *much* larger corpus than I've currently got, but it is definitely something I want to do. Thanks for pointing them out!
Reply
#25
Did you see this? http://kotoba.nuee.nagoya-u.ac.jp/sc/obi...c-sato.pdf
Reply