RECENT TOPICS » View all
When it comes to beginners and kanji characters, there is always a lot of debate about how many of them you need.
I've started studying Japanese a bit over a year ago, and I knew pretty much zero Japanese beforehand. I started with Anki and RTK 1, then continued with assorted sentences. Now that my sentence deck has hit 2000 sentences (an arbitrary, round number:)), I figured I'm in a position to invest my 2 cents into the "kanji count" discussion.
I was trying to do 10 new cards a day (that's all I had time for), and I was pretty successful at keeping that schedule.
Here is what my "Kanji added over time" chart looks like: 
So over 2000 sentences I've learned right around 1000 kanji characters. It's important to know that I chose sentences that contained common words I wanted to learn, and gave absolutely no consideration to their kanji content.
Also note how eerily straight the line is. This 1:2 ratio isn't some random fluke, but a consistent pattern. (I'm fairly sure though that this trend will break around 4000 sentences at most
)
It's also interesting to mention that some of these 1000 characters are ones that I didn't learn when I was doing RTK1, either because they are or RTK3, or they aren't in RTK at all. (I added them to my kanji deck when I encountered them, of course.)
I will let you draw the conclusions. But I get the feeling that the proportion of the number of words and number of kanji characters in the JLPT lists are... odd...
What is your experience?
Last edited by kerecsen (2011 January 17, 3:09 pm)
A lot of kanji in RTK 3 are common actually. There is going to be in the new edition of RTK. Since the japanese government approved more kanji to be in the main ones(very common ones). I'm not surprised you found a lot of kanji outside of RTK vol.1.
I found a lot outside of it as well. After a while you'll keep learning more and more.
Last edited by ta12121 (2011 January 17, 2:29 pm)
If you want to read
Pupular literature - 1000 to 1500 kanji
Contemporary literature - 3000 to 7000 kanji
Classical literature - 15000 kanji.
buonaparte wrote:
Classical literature - 15000 kanji.
Why do people always say this? My research specialty is classical literature and I don't know anywhere close to 15000 kanji, nor do you need anywhere close to that. It depends on which classical literature and exactly what you're doing with it, of course, but I would be surprised if anyone in the field knows that many kanji.
yudantaiteki wrote:
Why do people always say this? My research specialty is classical literature and I don't know anywhere close to 15000 kanji, nor do you need anywhere close to that. It depends on which classical literature and exactly what you're doing with it, of course, but I would be surprised if anyone in the field knows that many kanji.
I agree. I don't know were you got your numbers buonaparte, but contemporary literature must be 4000 kanji max, and classical literature must be maybe 6000 max (isn't that what the kanji exam (can't remember the name, keitei??) is for?).
Anybody knowing more is expert in old Chinese.
I have seen chunks of the Hagakure, and you can probably read it with less than 4000 (most of it are just old styles, but recognizable).
Last edited by EratiK (2011 January 17, 3:57 pm)
This old post by yudantaiteki suggests that 1500-2500 might be a more accurate estimate for "contemporary literature" too, depending on your definition of "need to know".
kerecsen: interesting graph, thanks. My personal guess is that the slope will start levelling out well before 1500 kanji -- it would be cool if you could add a new chart in three to six months time to let us know how it pans out.
6000 even seems high to me because it really depends on the literature. A lot of older literature (particularly in the Heian period) was written mostly in kana, and modern editions will actually add kanji to it to make it easier to decipher (although they usually use a lot of furigana).
My numbers come from a book by Wiesław Kotański, a university professor, who translated 万葉集 and 古事記 into Polish.
http://en.wikipedia.org/wiki/Man%27y%C5%8Dsh%C5%AB
http://en.wikipedia.org/wiki/Kojiki
Last edited by buonaparte (2011 January 18, 2:34 am)
Kojiki would be like the English equivalent of an old-english translation of some Norse-Mythology bible transcribed phonetically using another alphabet that was only known to old-english scribes at the time.
You'd need need to learn about 20 000 old-norse words (and their pronunciation) about 15 000 old-english words (+ pronunciation) to be able to understand it.
The number of people who bother to read a text is proportional to 1/X^2 where X is how many characters are supposedly in said text.
buonaparte wrote:
If you want to read
Pupular literature - 1000 to 1500 kanji
Contemporary literature - 3000 to 7000 kanji
Classical literature - 15000 kanji.
That's way off.
I'd say a functional reading ability is about 2500. Upper levels of proficiency (I'm talking KICKASS) in reading is 3 ~ 4K.
1000 - 1500 isn't nearly enough to read novels and i'd consider that to be popular literature. Also newspaper is 2100 aswell. Visual novels usually learn more towards the heavier side on kanji aswell.
I'm currently at 2.8K and can read novels/games (with a dic for rarer words). About 300 of those kanji are what i'd consider in the "useless" pile - I.e part of the 3K that's found exclusively in kanken level 1. So I know 2500 kanji that pop up in everyday stuff. I can't imagine trying to read anything knowing only half of that...
At any rate I wouldn't aim for arbitrary numbers but rather keep learning kanji that pop up in your reading until you just wind up not being able to learn any more!
My best guess is it starts to really really realllllly slow down at 3500.
Last edited by mezbup (2011 January 18, 7:25 am)
The Kojiki and Man'yoshu use a lot of kanji, but most non-specialists (and even some specialists) will read those in editions that transcribe the man'yogana into standard orthography and have a lot of furigana.
I'm not saying you can't find people studying classical literature who need to know that many kanji. But to simply say "classical literature - 15000 kanji" is wrong.
mezbup wrote:
I'd say a functional reading ability is about 2500. Upper levels of proficiency (I'm talking KICKASS) in reading is 3 ~ 4K.
1000 - 1500 isn't nearly enough to read novels and i'd consider that to be popular literature. Also newspaper is 2100 aswell. Visual novels usually learn more towards the heavier side on kanji aswell.
That sounds about right to me. I really can't imagine getting through the book I'm reading now (translated version of The Firm) with 1000 or even 500 characters less than the 2400-ish I know. I think the best bet is to learn the full 常用 and then go from there at your own pace.
As for classical literature, I don't know why it gets mentioned at all. It's like telling English students how many additional words they need to know to read Shakespeare fluidly. None of them are going to care.
Well, some people do want to read literature, even classical literature. But classical literature (and literature in general) tends to have a lot of furigana, so vocab and grammar are a bigger barrier to reading than kanji.
2500 seems quite high for a "functional reading ability", but as usual it depends on your definitions of "read" and "know [a kanji]". If "know a kanji" just means that the kanji doesn't present a barrier to your comprehension of a real piece of writing (in context), then maybe it works. I know there are a lot of kanji that if you showed me them out of context I wouldn't be able to give you readings or meanings, but if I see them in context I can at least tell you what the word it appears in means.
Last edited by yudantaiteki (2011 January 18, 10:49 am)
Opinions より data!
That guy named Pomax made a word frequency list based on more than a 1000 modern novels: http://pomax.nihongoresources.com/index … 1222520260 . I just analyzed that list, and found that:
the most frequent 89,776 words are composed of 4606 kanji,
50,000 words: 3900 kanji,
20,000 words: 2825 kanji,
5000 words: 1584 kanji,
1000 words: 474 kanji.
The same guy provides some more statistics here: http://pomax.nihongoresources.com/index … 1223045359 .
He found that one need 7495 words (1903 kanji) for 95% understanding (every 20th word unknown).
Kanji-wise I'm approximately at that level (RTK1), and it's not that bad. Murakami's 1Q84 (first book) contains 2067 kanji. I'm reading it right now, and I can understand almost all unkown words at first sight. I've never seen 弦楽器 before, but it's really obvious, especially in context.
Last edited by nortalf (2011 January 18, 11:51 am)
nortalf wrote:
He found that one need 7495 words (1903 kanji) for 95% understanding
I think the thing I take from that is that you really need quite a lot of words. If you've managed to get up to ~7500 words of vocabulary then you've probably acquired at least a recognition-level knowledge of a fairly high number of kanji without having to put any specific effort into it. Conversely, knowing 1900 kanji doesn't mean you have a 7500 word vocabulary. So counting kanji is a complete red herring -- if you have a big enough vocabulary to read a novel then you can read the novel, and that's the only measure that really matters.
nortalf wrote:
Opinions より data!
That guy named Pomax made a word frequency list based on more than a 1000 modern novels: http://pomax.nihongoresources.com/index … 1222520260 . I just analyzed that list, and found that:
the most frequent 89,776 words are composed of 4606 kanji,
50,000 words: 3900 kanji,
20,000 words: 2825 kanji,
5000 words: 1584 kanji,
1000 words: 474 kanji.
This data doesn't answer the vague question, though. "There are X kanji in this novel" is not the same thing as "You need to know X kanji to read this novel" without any further qualification. It ignores the presence of furigana as well as the cases where you can figure out the second kanji of a compound because you know the word. It also assumes 100% comprehension, which is not necessarily a requirement depending on what you actually mean by "read".
Soseki's Kokoro has 1934 unique kanji. So that does mean that if you wanted to read a copy of the novel with no furigana at 100% comprehension you would need to know 1934 kanji if we assume that you are never able to figure out an unknown kanji from context. But this scenario is forced -- I've never seen an edition of Kokoro without any furigana, and 100% comprehension is an unreasonable and unnecessary goal. I also think it's very unlikely that a learner with the necessary vocab and grammar knowledge to read Kokoro would be utterly stumped whenever they encountered an unknown kanji, having no recourse except to look it up in the dictionary.
I just think that in general all this focus on counting kanji and coming up with numbers ignores the vital importance of vocab and grammar in reading, and reinforces the often unstated assumption beginning learners have that the main thing you have to do to read Japanese is learn 1945 symbols.
Last edited by yudantaiteki (2011 January 18, 1:25 pm)
pm215 wrote:
Conversely, knowing 1900 kanji doesn't mean you have a 7500 word vocabulary. So counting kanji is a complete red herring -- if you have a big enough vocabulary to read a novel then you can read the novel, and that's the only measure that really matters.
Couldn't agree more.
And don't forget listening comprehension - it is a different skill. And speaking. And then writing.
nortalf wrote:
Opinions より data!
That guy named Pomax made a word frequency list based on more than a 1000 modern novels: http://pomax.nihongoresources.com/index … 1222520260 . I just analyzed that list, and found that:
the most frequent 89,776 words are composed of 4606 kanji,
50,000 words: 3900 kanji,
20,000 words: 2825 kanji,
5000 words: 1584 kanji,
1000 words: 474 kanji.
The same guy provides some more statistics here: http://pomax.nihongoresources.com/index … 1223045359 .
He found that one need 7495 words (1903 kanji) for 95% understanding (every 20th word unknown).
Kanji-wise I'm approximately at that level (RTK1), and it's not that bad. Murakami's 1Q84 (first book) contains 2067 kanji. I'm reading it right now, and I can understand almost all unkown words at first sight. I've never seen 弦楽器 before, but it's really obvious, especially in context.
'
I remember reading up about this on another site. Seems the 4000 kanji>50,000 words is what it takes to read through almost anything(99%?).
pm215 wrote:
Conversely, knowing 1900 kanji doesn't mean you have a 7500 word vocabulary.
It does mean. If you know the 2000 RTK1 kanji, you can easily guess the meaning of tens of thousands of words, especially the rare ones' (the frequent words are less logical). My example above (弦楽器) is 45,609th on the mentioned frequency list, so it's pretty rare. Three 常用漢字: string + music + instrument, it's obvious.
yudantaiteki wrote:
"There are X kanji in this novel" is not the same thing as "You need to know X kanji to read this novel"
Yes, but it means that "If you know X kanji you can get the gist of this novel, even if you don't know the words/readings, or your grammar sucks."
Maybe the opposite is true too: "If you speak a lot and know a lot of words (but not how to write them), you can understand something with the help of the furigana.", just that it's hard to imagine for me.
yudantaiteki wrote:
[...] the main thing you have to do to read Japanese is learn 1945 symbols.
Eventually this will be necessary. My best Japanese-related decision was to take Mr Heisig's advice and book, because after that (3 months) I can read basically anything. My second best decision was to stop at 2000 kanji and wait to see if I need another 1000, because after 3 years I still don't need them.
Grammar and vocab are far more important to get the gist of writing than kanji is. I've never found kanji meanings to be very reliable to guessing the meaning of words -- obviously there's some easy ones but a lot of them are not.
"If you speak a lot and know a lot of words (but not how to write them), you can understand something with the help of the furigana.", just that it's hard to imagine for me.
You don't have to imagine it, it's true. Obviously I'm not talking about knowing 50 kanji or something like that, I'm comparing 1300 with 3000.
Last edited by yudantaiteki (2011 January 18, 3:16 pm)
nortalf wrote:
Eventually this will be necessary. My best Japanese-related decision was to take Mr Heisig's advice and book, because after that (3 months) I can read basically anything.
Your definition of 'read' is pretty loose imo...!
I have to imagine it because I never experienced
. Just like you (I assume) never experienced reading with strong kanji knowledge and zero grammar. We are coming from the opposite directions, and hopefully reach the same perfection in every field
.
I've never found the readings to be very informative concerning the meaning, either.
Even kanji better than that...
nortalf wrote:
My example above (弦楽器) is 45,609th on the mentioned frequency list, so it's pretty rare.
...according to that list.
弦楽器 is not a rare word.
Aren't you corroborating Yudan's point when you say this?:
Eventually this will be necessary. My best Japanese-related decision was to take Mr Heisig's advice and book, because after that (3 months) I can read basically anything. My second best decision was to stop at 2000 kanji and wait to see if I need another 1000, because after 3 years I still don't need them.
@kerecsen, Where did you get that Kanji added over time graph?
JimmySeal wrote:
@kerecsen, Where did you get that Kanji added over time graph?
From the Anki plugin called "kanji and success graph 1.2"
ta12121 wrote:
nortalf wrote:
Opinions より data!
That guy named Pomax made a word frequency list based on more than a 1000 modern novels: http://pomax.nihongoresources.com/index … 1222520260 . I just analyzed that list, and found that:
the most frequent 89,776 words are composed of 4606 kanji,
50,000 words: 3900 kanji,
20,000 words: 2825 kanji,
5000 words: 1584 kanji,
1000 words: 474 kanji.
The same guy provides some more statistics here: http://pomax.nihongoresources.com/index … 1223045359 .
He found that one need 7495 words (1903 kanji) for 95% understanding (every 20th word unknown).
Kanji-wise I'm approximately at that level (RTK1), and it's not that bad. Murakami's 1Q84 (first book) contains 2067 kanji. I'm reading it right now, and I can understand almost all unkown words at first sight. I've never seen 弦楽器 before, but it's really obvious, especially in context.'
I remember reading up about this on another site. Seems the 4000 kanji>50,000 words is what it takes to read through almost anything(99%?).
I'd say 20,000 is more like 99% and 50,000 is more like 99.9%.

