I'm trying to find a tool that can scan a text and tell me the number of kanji from each grade level, similar to the kanji statistics in Anki. I thought that surely there must be something like this, but searching around for a couple of hours hasn't turned anything up for me. Any ideas?
2015-03-11, 5:13 pm
2015-03-12, 1:11 am
I threw together something because I had all the pieces on hand:
http://fasiha.github.io/kanjiyears/
It's not much, but maybe it'll be useful. I'll make it prettier and whatnot tomorrow. (Code at https://github.com/fasiha/kanjiyears released to public domain. Though I'm not sure if the list of joyo kanji is copyrighted by Japanese Cabinet...)
http://fasiha.github.io/kanjiyears/
It's not much, but maybe it'll be useful. I'll make it prettier and whatnot tomorrow. (Code at https://github.com/fasiha/kanjiyears released to public domain. Though I'm not sure if the list of joyo kanji is copyrighted by Japanese Cabinet...)
2015-03-12, 5:11 am
Strangely enough I recently found myself wishing I had a tool just like this, so thank you aldebrn, this is greatly appreciated.
If I could suggest something to make it more useful, it would be to have the "Secondary" kanji split up further in accordance with the kanji kentei, like so:
10級 小学校1年生修了程度(計80字)
9級 小学校2年生修了程度(計160字、累計240字)
8級 小学校3年生修了程度(計200字、累計440字)
7級 小学校4年生修了程度(計200字、累計640字)
6級 小学校5年生修了程度(計185字、累計825字)
5級 小学校6年生修了程度(計181字、累計1006字)
4級 中学校在学程度(計316字、累計1322字)
3級 中学校卒業程度(計285字、累計1607字)
準2級 高校在学程度(計333字、累計1940字)
2級 高校卒業・大学・一般程度(計196字、累計2136字)
( source: http://kakijun.jp/main/kankenmain.html )
This way it accurately covers each grade level, and also provides further distinction between the secondary-level kanji. Again, thanks a lot for taking the trouble to put this together!
If I could suggest something to make it more useful, it would be to have the "Secondary" kanji split up further in accordance with the kanji kentei, like so:
10級 小学校1年生修了程度(計80字)
9級 小学校2年生修了程度(計160字、累計240字)
8級 小学校3年生修了程度(計200字、累計440字)
7級 小学校4年生修了程度(計200字、累計640字)
6級 小学校5年生修了程度(計185字、累計825字)
5級 小学校6年生修了程度(計181字、累計1006字)
4級 中学校在学程度(計316字、累計1322字)
3級 中学校卒業程度(計285字、累計1607字)
準2級 高校在学程度(計333字、累計1940字)
2級 高校卒業・大学・一般程度(計196字、累計2136字)
( source: http://kakijun.jp/main/kankenmain.html )
This way it accurately covers each grade level, and also provides further distinction between the secondary-level kanji. Again, thanks a lot for taking the trouble to put this together!
Edited: 2015-03-12, 5:39 am
Advertising (Register to hide)
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions!
- Sign up here
2015-03-12, 5:36 am
Thanks a lot! Highly appreciated!
2015-03-12, 10:03 am
Roketzu Wrote:If I could suggest something to make it more useful, it would be to have the "Secondary" kanji split up further in accordance with the kanji kentei, like so:Suggestions are always welcome, especially when so helpfully detailed.
Updated so secondary school and non-joyo kanji are further broken down into kanken levels.
2015-03-12, 10:46 am
aldebrn Wrote:Wonderful! I almost feel bad to suggest another thing, but I do think it would make the tool more complete. It would be nice if it were possible to have a toggle which showed which characters from the pasted text are not included in each level, making it possible to easily figure out which characters you might be missing when trying to devise a comprehensive list. Maybe showing them in a different color, or whatever you think would be best.Roketzu Wrote:If I could suggest something to make it more useful, it would be to have the "Secondary" kanji split up further in accordance with the kanji kentei, like so:Suggestions are always welcome, especially when so helpfully detailed.
Updated so secondary school and non-joyo kanji are further broken down into kanken levels.
Just as an example of how it would help me, I've been working through pre-built word lists in Skritter and I'm just assuming they are covering every kanji at each level, but it's difficult to know which characters I'm missing when the numbers don't add up!
Edited: 2015-03-12, 11:07 am
2015-03-12, 11:07 am
Roketzu Wrote:It would be nice if it were possible to have a toggle which showed which characters from the pasted text are not included in each level, making it possible to easily figure out which characters you might be missing when trying to devise a comprehensive list. Maybe showing them in a different color, or whatever you think would be best.Could you explain further? I don't quite understand, do you mean, instead of showing:
- Grade 1: kanji in grade 1
- Grade 2: kanji in grade 2
...
it show:
- kanji you don't know if you're not yet in school: (all kanji in text)
- kanji you don't know if you finished grade 1: (smaller list)
- kanji you don't know if you finished grade 2: (even smaller list)
...
- kanji you don't know if you've finished 12th grade: (non-joyo)
- kanji you don't know if you've passed kanken (say) 4: (smaller list of non-joyo kanji)
...
- kanji you don't know if you've passed kanken 1.5: (non-kanken kanji)
?
2015-03-12, 11:17 am
aldebrn Wrote:Sorry I should have been clearer.Roketzu Wrote:It would be nice if it were possible to have a toggle which showed which characters from the pasted text are not included in each level, making it possible to easily figure out which characters you might be missing when trying to devise a comprehensive list. Maybe showing them in a different color, or whatever you think would be best.Could you explain further?
![[Image: kanji223.png]](http://s11.postimg.org/c6lqv06fn/kanji223.png)
I mean something like this, like where it says "Missing kanji" here it would show which kanji from each grade didn't appear in the pasted text. I don't mean whether you know them or not per-grade, just which ones didn't appear in the text, but are part of that grade. You could ignore non-joyo/kanken above 2 for this because there would be too many to list!
Edited: 2015-03-12, 11:33 am
2015-03-13, 12:02 pm
2015-03-13, 12:13 pm
aldebrn Wrote:Understood. See if you like how I did it:I pasted some text in the box and clicked on "Run" but it doesn't do anthing. Is something wrong? Thanks.
http://fasiha.github.io/kanjiyears/
2015-03-13, 12:20 pm
@alderbrn what you did was sweet!
I think what would be cool is if a a page is added to the koohi wiki and everybody can post their results for each book they use it on. I know one thing that keeps me from Japanese books is not not knowing if they are at a reasonable level for me!
Having a list of books with their kanji level would be awesome!
I think what would be cool is if a a page is added to the koohi wiki and everybody can post their results for each book they use it on. I know one thing that keeps me from Japanese books is not not knowing if they are at a reasonable level for me!
Having a list of books with their kanji level would be awesome!
2015-03-13, 12:21 pm
aldebrn Wrote:Understood. See if you like how I did it:Wow, yeah, this is perfect! I found those pesky kanji I was (seemingly) missing, and it seems like they were all non-standard variations on what I have actually already covered.
http://fasiha.github.io/kanjiyears/
? = 叱
塡 = 填
剝 = 剥
頰 = 頬
I'm not sure if it's an issue with the kanji list you are pulling from, but the ones on the right are the correct (or more acceptable) variations on these kanji. You've made a great tool here and I hope others can get some good use out of it like I can/have done.
2015-03-13, 2:53 pm
juniperpansy Wrote:Having a list of books with their kanji level would be awesome!http://forum.koohii.com/showthread.php?p...#pid167827
http://kotoba.nuee.nagoya-u.ac.jp/sc/obi2/obi_e.html
http://www.mediafire.com/download/e0euaw...120527.zip
2015-03-13, 2:56 pm
john555 Wrote:I pasted some text in the box and clicked on "Run" but it doesn't do anthing. Is something wrong? Thanks.Do you have JavaScript disabled? (JavaScript does all the processing client-side, there's no server that you're uploading your text to. If it's enabled, can you give your OS & browser with version number?
juniperpansy Wrote:I think what would be cool is if a a page is added to the koohi wiki and everybody can post their results for each book they use it on. I know one thing that keeps me from Japanese books is not not knowing if they are at a reasonable level for me!You're right, that would be very interesting. I could add a feature where the app exports the data as a CSV file, which you can then paste on wiki, etc., and the opposite feature where you can upload/paste the CSV file and it'd display a nice visual report of it. Maybe even set up a backend so that you can automatically publish and share this kind of analysis?
Having a list of books with their kanji level would be awesome!
Roketzu Wrote:I found those pesky kanji I was (seemingly) missing, and it seems like they were all non-standard variations on what I have actually already covered.Thanks for this!!! Unicode has (1) multiple codepoints that look like similar kanji, some of which are included in Shift-JIS (and thus found in Japanese texts) while others look closer to the official Cabinet of Japan documentation, and also (2) there are multiple codepoints which are supposed to be identical. Clearly my grasp of this is not good enough, I'll try to either just replace the data to be correct, or if no “correct” exists, I'll *gasp* try to do Unicode normalization…
? = 叱
塡 = 填
剝 = 剥
頰 = 頬
Please keep the suggestions/bug reports coming.
2015-03-13, 2:58 pm
toshiromiballza Wrote:http://forum.koohii.com/showthread.php?p...#pid167827Yes, porting cb's Japanese Text Analysis Tool to JavaScript is probably very worthwhile. Especially if you hook it up to a backend so people can share the reports of various texts, and find texts at some level.
http://kotoba.nuee.nagoya-u.ac.jp/sc/obi2/obi_e.html
http://www.mediafire.com/download/e0euaw...120527.zip
Also, I just put the Innocent Novel Analysis reports as a gist so you don't have to be assaulted by MediaFire: https://gist.github.com/fasiha/779f73f802b80520db4a
Edited: 2015-03-13, 3:06 pm
2015-03-13, 4:00 pm
aldebrn Wrote:Clearly my grasp of this is not good enough, I'll try to either just replace the data to be correct, or if no “correct” exists, I'll *gasp* try to do Unicode normalization…You only need to make an exception for the 4 kanji mentioned above, they are the only joyo kanji whose codepoints are outside of JIS X 0208 and their non-joyo variants are therefore more commonly used (and allowed to be used).
2015-03-13, 4:27 pm
aldebrn Wrote:Please keep the suggestions/bug reports coming.Those 4 were the only kanji out of over the 2400+ I've covered with Skritter, so I'm pretty sure that as far as joyo kanji are concerned they are the only ones with any issue, as toshiromiballza says.
As for more suggestions, well I may be pushing my luck at this point but more color differentiation would be nice. It seems there is already an established color scheme to differentiate between grade/kanken levels ( http://www.kanken.or.jp/kanken/outline/degree.html ), so you could go by that, or something you think might be better.
I think it would be easier to digest the results if the missing kanji were all hidden by default and you had to click to reveal them, or maybe more color alone would solve this problem. Then possibly a summary of the results at the end, something like:
Summary:
Grade 1 → 26 kanji, 33% coverage
~~
~~
Kanken 4 → 10 kanji, 3.2% coverage
~~
~~
Total kanji:
(Any other stats you think might be worthwhile)
2015-03-13, 9:47 pm
toshiromiballza Wrote:You only need to make an exception for the 4 kanji mentioned above, they are the only joyo kanji whose codepoints are outside of JIS X 0208 and their non-joyo variants are therefore more commonly used (and allowed to be used).Thanks Toshiro-san, you are a gentleman/woman and a scholar. I've made a note that the tool might want to replace the non-JIS X 0208 codepoints present in the input text with the more common ones before any analysis.
This *should* be fixed now, @Roketzu?
Roketzu Wrote:more color differentiation would be nice. It seems there is already an established color scheme to differentiate between grade/kanken levels ( http://www.kanken.or.jp/kanken/outline/degree.html ), so you could go by that, or something you think might be better.- Is there a color scheme for *grades*, like your link gives for kanken?
I think it would be easier to digest the results if the missing kanji were all hidden by default and you had to click to reveal them, or maybe more color alone would solve this problem. Then possibly a summary of the results at the end, something like:
Summary:
Grade 1 → 26 kanji, 33% coverage
~~
~~
Kanken 4 → 10 kanji, 3.2% coverage
~~
~~
Total kanji:
(Any other stats you think might be worthwhile)
- Is the problem that the separation between grades/kanken levels is visually ugly? I can definitely agree with this and will try to make it easier to understand. What kind of text are you using the tool with? I'm just testing it with, e.g., the first couple of paragraphs of Botchan (http://www.natsumesoseki.com/home/botchan), and it might help if I saw the same output as you—can you stick all the kanji in your text on a gist or pastebin?
- Oh man, as I was implementing the fancy bit where it shows only one line's worth of missing kanji and puts the … link at the right place to show the rest, I was thinking “This code is too clever by half, it's gonna have to be deleted soon.” Good riddance, I'll make the missing kanji default to hidden with a button to show them individually/all of them.
- Stats are a good idea. What's the ~ denote in your example?
2015-03-14, 4:21 am
aldebrn Wrote:- Is there a color scheme for *grades*, like [Roketzu's] link gives for kanken?Kanken levels 10-5 correspond exactly to grades 1-6, so...
2015-03-14, 5:51 am
aldebrn Wrote:This *should* be fixed now, @Roketzu?Yep, it now shows 100% coverage for kanken 1.5. This was the only level that wasn't already showing as 100% covered so that problem is taken care of now! Also, it would be more accurate if the current kanken 2 was 2.5 and what you have as 1.5 was kanken 2.
aldebrn Wrote:Is there a color scheme for *grades*, like your link gives for kanken?As Vampele pointed out, kanken levels 10-5 correspond to grades 1-6 =)
aldebrn Wrote:Is the problem that the separation between grades/kanken levels is visually ugly? I can definitely agree with this and will try to make it easier to understand. What kind of text are you using the tool with? I'm just testing it with, e.g., the first couple of paragraphs of Botchan (http://www.natsumesoseki.com/home/botchan), and it might help if I saw the same output as you—can you stick all the kanji in your text on a gist or pastebin?I've mostly been using it with the output of a 10,000+ word text file from my Skritter, just making sure I've covered all kanji. This is changing every day as I add more new kanji, but I've already covered kanken 10-2 so everything I'm adding now would be in the 準1級 -1級 range. (Here is the raw kanji output of the previous list)
I've just been looking at the tool as something that could be a lot more helpful for those who are still learning kanji and want to have a fast and easy way to figure out what level a piece of text is, whether there are too many unknown kanji they haven't yet covered or something along those lines. I don't think it's visually ugly or anything, just that it currently throws a lot of text at you without any easy way to visually distinguish between levels or get an overall idea of the valuable information it provides.
aldebrn Wrote:Oh man, as I was implementing the fancy bit where it shows only one line's worth of missing kanji and puts the … link at the right place to show the rest, I was thinking “This code is too clever by half, it's gonna have to be deleted soon.” Good riddanceHaha, while I can't distinguish between what is clever code or not, I can certainly appreciate the end result and it was immediately impressive because it's exactly what I suggested! Coders are like magicians to me, and their magic is not lost even on a layman like myself.
aldebrn Wrote:Stats are a good idea. What's the ~ denote in your example?That was just to save from enumerating everything
I just meant that all levels/grades would be shown in the summary.
Edited: 2015-03-14, 5:54 am
2015-03-17, 2:00 pm
Tweaked a few visual things and added a bit of statistics, could definitely use more work (e.g., a few years in design school), but see what you think. Comments/suggestions/rage welcomed.
http://fasiha.github.io/kanjiyears/
http://fasiha.github.io/kanjiyears/
2015-03-17, 4:54 pm
This is slick, what a nice tool it's become in such a short time. I've even found an unexpected use out of it that has saved me a bunch of time, so thank you for that.
I can't even think of any more suggestions!
I can't even think of any more suggestions!
2015-03-17, 10:18 pm
Vempele Wrote:Kanken levels 10-5 correspond exactly to grades 1-6, so...Thanks for this. I am pretty embarrassed that I didn't know or notice this until your mentioned it Vempele. The app and code became a lot more streamlined thanks to this fact.
john555 Wrote:I pasted some text in the box and clicked on "Run" but it doesn't do anthing. Is something wrong? Thanks.I'm so sorry, I tested it in IE/Win7 and saw it doesn't implement promises, a new-ish javascript feature. I polyfilled this, it looks like it's working now in IE!
Edited: 2015-03-18, 1:50 pm
2015-03-17, 11:56 pm
Thanks a lot @alderbrn ! :-)
I was looking for a tool exactly like this before. It is really helpful for analysing online news articles to assess their kanji level (I am still a beginner).
I looked over your Javascript code & it is really good, functional-style code! Kudos!
I also took 'inspiration' from your code & implemented a basic version of the program in Flask framework in Python (I'm learning Flask right now).
Thanks again!
I was looking for a tool exactly like this before. It is really helpful for analysing online news articles to assess their kanji level (I am still a beginner).
I looked over your Javascript code & it is really good, functional-style code! Kudos!
I also took 'inspiration' from your code & implemented a basic version of the program in Flask framework in Python (I'm learning Flask right now).
Thanks again!
2015-03-18, 11:20 am
Roketzu Wrote:I've even found an unexpected use out of itPlease share with us!

