![]() |
|
Top Japanese kindle books sorted by difficulty - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: Learning resources (http://forum.koohii.com/forum-9.html) +--- Thread: Top Japanese kindle books sorted by difficulty (/thread-12609.html) Pages:
1
2
|
Top Japanese kindle books sorted by difficulty - zaydana - 2015-03-24 Yep, I'd seen it but hadn't had a good read through it until just now. It is fascinating that they're going based on single characters instead of word boundaries - makes me feel a little less bad about rating based purely on Kanji. The thing I'm not so sure I agree with in OBI2 is they're basically just matching text to Japanese school grades. Learning Japanese as a second language, you're not necessarily going to learn everything in the same order than school kids will. It seems to me like a better rating for us second language learners is to estimate how "typical" a Japanese text is, as opposed to predicting which grade it would be learned in. This way, studying by moving up the typicalness scale will allow you to read an increased number of texts online. Either way, it is a good read. Thanks! Top Japanese kindle books sorted by difficulty - aldebrn - 2015-03-24 zaydana Wrote:I won't be able to get it working immediately, because an easiness ranking based on vocab will need a *much* larger corpus than I've currently got, but it is definitely something I want to do. Thanks for pointing them out!cb ran his Japanese Text Analysis Tool on a corpus of 5000+ modern Japanese novels and posted the results on MegaFireShareCrap (link and description at http://forum.koohii.com/showthread.php?pid=167828#pid167828). I reposted the resulting text files to https://gist.github.com/fasiha/779f73f802b80520db4a which you can git-clone or download via browser. Included: - MeCab base lemma frequency report, and probably most relevant to this project: columns are "frequency count, MeCab lemma, percentage, cumulative percentage, raw MeCab part-of-speech analysis": word_freq_report.txt - kanji frequency report: kanji_freq_report.txt - readability formula applied to each file in the "corpus": formula_based_readability_report.txt NB. Links to the full "Innocent corupus" of 5000+ novels may be found on this forum… Top Japanese kindle books sorted by difficulty - zaydana - 2015-03-26 aldebrn, this is amazing. Thanks! I'm still thinking about how to use the words in the score, but for the moment I've rerun my algorithm using your kanji frequency report instead of building one from the tested books. I've also increased the number of Kanji I use to 7000. The results seem to make a little more sense than previously. On each book page, I've also added color-coded kanji based on the Kyoiku Kanji, and a graph of how many Kanji are introduced over time at the bottom of each page. This should let people get a better idea of how accurate each score is. One night I need to run my analysis on the books in the innocent corpus to see how it matches up with the two academic scores. Will post results once that is done. Top Japanese kindle books sorted by difficulty - aldebrn - 2015-03-26 Excellent, but please note that that data and work is entirely cb4960's! I love the new graphics and visualization elements on Read Your Level. I know it'd be a pain in the butt, but if you made the graph interactive (using d3.js or something) so that a mouse hovering over a point would indicate what kanji the point corresponds to, that would be (a) useful and (b) super-slick .What color-scheme did you use for the grades? I recently started using the color scheme at kanken.or.jp (code to save you two minutes) and admit although it seems arbitrary I'm getting used to it, and I'm wondering if your colors come from somewhere like that. I also like using the Kanken colors/levels to break down the secondary school years too (this was a suggestion by Roketzu for another tool). I'm also really curious what results you get if you do get to run the excerpts through MeCab and analyze the frequency of *words* against cb's word_freq_report.txt, that is, vocab-based ranking instead of kanji-based ranking (or a combination of the two). I guess to be scientific you'd have a set of books whose relative difficulty you could order, and you'd test these various ranking methods against that...? Top Japanese kindle books sorted by difficulty - zaydana - 2015-03-26 For color scheme, I used this: http://en.wikipedia.org/wiki/Ky%C5%8Diku_kanji#/media/File:3002_Kanji.svg Also, the graph was already using d3 and had the popovers there but commented out. I've got them working, so you should see the kanji if you hover over a dot on the graph. Didn't think about kanken, that sounds like a good idea too. Will add it to the list. The problem with running a word based ranking is the current algorithm isn't a simple ranking. It scores each kanji based on estimated "painfulness" instead, which is based on the value of a logistic curve at the ranking (see the graph at http://readyourlevel.jamesknelson.com/articles/how-the-easiness-score-is-calculated). This works well for 2000 kanji with the inflection point at 500, but I'd have to experiment a bit to get it to work for words. I'll report back (and to my mailing list) once it is done, but I don't think it'll happen within the next week. Top Japanese kindle books sorted by difficulty - Aikynaro - 2015-03-29 Some more books to rank - these from my 'looks interesting but too difficult to bother with' list: http://www.amazon.co.jp/ebook/dp/B00N7BXSC6/ http://www.amazon.co.jp/ebook/dp/B009GPM3PA http://www.amazon.co.jp/ebook/dp/B00GJMUNB4 http://www.amazon.co.jp/ebook/dp/B00IWHT7VG http://www.amazon.co.jp/ebook/dp/B00BMM5OV0 http://www.amazon.co.jp/ebook/dp/B00GJMUVH0 http://www.amazon.co.jp/ebook/dp/B00M9372D4 http://www.amazon.co.jp/ebook/dp/B00BWF6YQQ http://www.amazon.co.jp/ebook/dp/B00ARCOVLU http://www.amazon.co.jp/UFO-1-ebook/dp/B00IUAYBCA http://www.amazon.co.jp/ebook/dp/B00AJCM55W Top Japanese kindle books sorted by difficulty - zaydana - 2015-04-06 Thanks Aikynaro, I've put them on the list for next week. In the meantime, I've just made a rather major update: - The difficulty rating is now based on word frequencies, not just kanji frequencies! - You can see the number of (possibly difficult) words in the sample, instead of just the number of kanji - Furigana are now taken into account, and you can find books with lots of furigana on the index page - Instead of graphing introduction of new words, you can see a graph of "frustration", which is modelled on the closeness of difficult looking words I've also removed books which are too hard from the listing, since nobody seems to be interested in them, and added the following books from a list I found elsewhere on the site GO 涼宮ハルヒの憂鬱 失はれる物語 しあわせは子猫のかたち Would love to hear your opinions on what would be the next step on making the site more useful! |