How many readings do you need to know?

Index » The Japanese language

jajaaan Member
From: America Registered: 2009-11-14 Posts: 115

Womacks23 wrote:

jeez guys I already answered this on the third post

And I thank you for it.

Ben Bullock Member
Registered: 2010-01-19 Posts: 61

Evil_Dragon wrote:

Ben Bullock wrote:

No, it's called bullshit.

Out of curiosity, how many Kanji do you think are necessary to, let's say, read a good book?

Just out of curiosity, I did some work counting kanji in various books I downloaded from Aozora Bunko. Here are the titles and the number of kanji:

三四郎 (no. 794): 1817
河童 (no. 69): 1226
人間失格 (no. 301): 1568
Unique kanji in above three books: 2271

Here's the computer code, if anyone needs it:

Code:

#!/home/ben/software/install/bin/perl
use warnings;
use strict;
my $file = $ARGV[0];
if (! $file) {
    die "No file name";
}
if (! -f $file) {
    die "'$file' not found";
}
my %kanji;
open my $input, "<:encoding(shift-jis)", $file or die $!;
while (my $line = <$input>) {
    my @chars = grep /\p{InCJKUnifiedIdeographs}/, (split //, $line);
    for (@chars) {
    $kanji{$_}++;
    }
}
print scalar keys %kanji, "\n";
close $input or die $!;

Last edited by Ben Bullock (2010 February 13, 4:04 am)

Womacks23 Member
From: 恵比寿 Registered: 2008-01-10 Posts: 596

You should try your code on post war lit. I bet you'd get different results.

Advertising (register and sign in to hide this)
JapanesePod101 Sponsor
 
Evil_Dragon Member
From: Germany Registered: 2008-08-21 Posts: 683

Ben Bullock wrote:

The only actual analysis I've done was a project where I indexed every kanji out of about ten thousand documents in Japanese. The total number of kanji used was 2,400. That's from 10,000 documents. And many of the kanji were only used in one or two of the documents, or for one person's name.

10.000 documents of what kind? To me personally that number seems pretty low, especially for such a large number of documents.

Just for fun I made a deck out of Natsume Sousekis "I am a cat", Ankis kanji statistics are as follows:

2839 total unique kanji.
Old Jouyou: 1775 of 1945 (91.3%).
New Jouyou: 141 of 191 (73.8%).
Jinmeiyou (reg): 320 of 645 (49.6%).
Jinmeiyou (var): 5 of 145 (3.4%).
598 non-jouyou kanji.

Well, one could of course argue how Natsume Souseki is notorious for his use of Kanji. This probably does not even slightly compare to any newspaper out there. wink However, if this is the target one is aiming for (I am), good luck. wink

(Off Topic, sorry)

nest0r Member
Registered: 2007-10-19 Posts: 5236 Website

I know we've had some people do kanji counts in other threads, but I can't remember: has anyone done them on light novels? There's a bunch floating around *cough* raseru *cough* on Nyaatorrents and various forums...

Womacks23 Member
From: 恵比寿 Registered: 2008-01-10 Posts: 596

nest0r wrote:

I know we've had some people do kanji counts in other threads, but I can't remember: has anyone done them on light novels? There's a bunch floating around *cough* raseru *cough* on Nyaatorrents and various forums...

キノの旅

1353 total unique kanji.
Old Jouyou: 1217 of 1945 (62.6%).
New Jouyou: 48 of 191 (25.1%).
Jinmeiyou (reg): 41 of 645 (6.4%).
Jinmeiyou (var): 0 of 145 (0.0%).
47 non-jouyou kanji.


ゼロの使い魔


1484 total unique kanji.
Old Jouyou: 1273 of 1945 (65.4%).
New Jouyou: 70 of 191 (36.6%).
Jinmeiyou (reg): 68 of 645 (10.5%).
Jinmeiyou (var): 1 of 145 (0.7%).
72 non-jouyou kanji.

mezbup Member
From: sausage lip Registered: 2008-09-18 Posts: 1681 Website

Also even if a book says it only uses 1800 kanji that doesn't mean 1800 jouyou kanji. It could be 1500 jouyou and 300 non-jouyou and then when you're running the same counts on other books and it's returning 1200 well it could be 1150 jouyou and 50 non-jouyou and they could be a totally different 50 non-jouyou than the 300 from the previous book. It cycles like this i'll bet. That's why these numbers are an ok indicator of things but not brilliant.

Edit: you guys kinda beat me to illustrating the point! That's exactly what I was talking about though.

Evil-Dragon's count gives us a better picture of the where the kanji lie. 598 non jouyou? That's wayyy above needing 2000 kanji and people are arguing "you don't even need 2000 to read"...? Hence before I said 1000 - 1500 to be at a point where you pick up something and start decoding it and heading up to 2000 you start being able to read but still rely on a dictionary and if you can read 3000 kanji I'm sure you're vocab would be huge, hence the dictionary wouldn't be needed except in rare cases.

Novels are a really high level of reading IMO (way higher than a newspaper)

Last edited by mezbup (2010 February 13, 4:24 am)

Womacks23 Member
From: 恵比寿 Registered: 2008-01-10 Posts: 596

Most publications are loaded with furigana for the rare kanji.

Ben Bullock Member
Registered: 2010-01-19 Posts: 61

Evil_Dragon wrote:

Ben Bullock wrote:

The only actual analysis I've done was a project where I indexed every kanji out of about ten thousand documents in Japanese. The total number of kanji used was 2,400. That's from 10,000 documents. And many of the kanji were only used in one or two of the documents, or for one person's name.

10.000 documents of what kind? To me personally that number seems pretty low, especially for such a large number of documents.

It's lower than the numbers that people are pulling out of their imaginations.

Just for fun I made a deck out of Natsume Sousekis "I am a cat", Ankis kanji statistics are as follows:

2839 total unique kanji.

I got 2837.

Old Jouyou: 1775 of 1945 (91.3%).
New Jouyou: 141 of 191 (73.8%).
Jinmeiyou (reg): 320 of 645 (49.6%).
Jinmeiyou (var): 5 of 145 (3.4%).
598 non-jouyou kanji.

Of these, 514 kanji are used only once, and 250 are used only twice.

Well, one could of course argue how Natsume Souseki is notorious for his use of Kanji. This probably does not even slightly compare to any newspaper out there. wink However, if this is the target one is aiming for (I am), good luck. wink

(Off Topic, sorry)

The goal seems to have more to do with boasting about numbers than actual reading of Japanese.

Jarvik7 Member
From: 名古屋 Registered: 2007-03-05 Posts: 3946

Newspapers aren't hard because they use lots of kanji, they are hard because they use a lot of WORDS that aren't frequently encountered in daily conversation or more casual reading material. Once you get to a high level of Japanese you'll see that kanji are no big deal, but unknown words are the problem. If everything was just a kanji problem I'd have kanken 1.5kyuu right now.

Last edited by Jarvik7 (2010 February 13, 4:39 am)

Womacks23 Member
From: 恵比寿 Registered: 2008-01-10 Posts: 596

mezbub, I think you're really underestimating the prevalence of furigana in Japanese books. "I am a cat" is read by most kids in Japan during high school. You'd be stretching it far to even find a kid who knows half of the 2,800 kanji used in that book.

So it's loaded up with furigana.

Evil_Dragon Member
From: Germany Registered: 2008-08-21 Posts: 683

Womacks23 wrote:

Most publications are loaded with furigana for the rare kanji.

They usually do this only once (or every once in a while) per word, but certainly not every time. Maybe I have a bad memory, but I frequently forget how to read certain words by the time they appear a second (or third) time. It would be nice though if they were a little more generous with furigana.

Ben Bullock wrote:

The goal seems to have more to do with boasting about numbers than actual reading of Japanese.

Yeah, soon the day will come when I'll know the Daikanwajiten from cover to cover and look down upon thee. Wait, how did you find out about my plan? wink

Jarvik is right though. Why worry about Kanji or readings, vocabulary is far more important. You'll learn all the Kanji you'll ever need on the way.

Last edited by Evil_Dragon (2010 February 13, 5:03 am)

mezbup Member
From: sausage lip Registered: 2008-09-18 Posts: 1681 Website

Ben Bullock wrote:

Of these, 514 kanji are used only once, and 250 are used only twice.

Such is the nature of frequency. Doesn't matter how many times it comes up, if you don't know how to read it it's going to break your reading flow when you look it up. Also, if a book is say 180 - 240 pages (roughly) that's a couple of kanji that are only used once PER page (on average). If you want to be able to read the book without using a dictionary it's actually going to take a very high level of knowledge. If you're cool with stopping a few times a page to look stuff up or skip past it you'll need far less knowledge and can still get by.

You'll find when you start reading a lot of stuff (without furigana) on a daily basis that it's annoying to have to stop all the time to look stuff up.

Last edited by mezbup (2010 February 13, 5:24 am)

pm215 Member
From: UK Registered: 2008-01-26 Posts: 1354

Jarvik7 wrote:

Newspapers aren't hard because they use lots of kanji, they are hard because they use a lot of WORDS that aren't frequently encountered in daily conversation or more casual reading material.

Yeah, I think this is absolutely right. Also newspaper articles are on a huge range of different topics, so the words you've picked up from one article are no help in the next article. With novels generally quite a few of the words you didn't know in chapter one are going to reappear later on, so you get natural reinforcement of them.

My personal blind spot with reading novels is character names -- I wish they'd furigana them on every use, because I can never remember them :-)

To the OP: I'd suggest that a goal like "work through the whole JLPT2 (1) vocab list" would be more useful than working through a list of readings, and I think would still match the kind of long-term measurable progress you're after.

mezbup Member
From: sausage lip Registered: 2008-09-18 Posts: 1681 Website

I find if you read or watch news A LOT you quickly pick up the bulk of the news jargon and verbs that are commonly used to describe incidents. Then yeah, it's a matter quite specific things being talked about that can trip you up (睡眠導入剤 for example).

2.5 months ago if I watched a news clip I understood virtually none of it but now I can follow whats going on and if I don't catch something it's usually one of those words. Still lots of work to be done though. That and I've only been following crime, accidents, fire and death so finance, politics and foreign affairs I'd still be clueless on.

JimmySeal Member
From: Kyoto Registered: 2006-03-28 Posts: 2279

mezbup wrote:

if you don't know how to read it it's going to break your reading flow when you look it up. Also, if a book is say 180 - 240 pages (roughly) that's a couple of kanji that are only used once PER page (on average).

You seem to have missed Womacks23's recent comment.  You said toward the beginning of this thread that one needs to know 2500-3000 kanji to read a book without a dictionary.  That's absurd.  If that were the case, Japanese would either (a) never read anything, or (b) carry around dictionaries around wherever they go, 'cause they sure as heck don't all know that many kanji.  We all know that neither (a) nor (b) is true.

Your trouble is that you think you need to look up every word that you don't know.  Show me someone who claims to know every word they see in their native language, and I will show you someone who is either lying or delusional.

Last edited by JimmySeal (2010 February 13, 6:41 am)

Evil_Dragon Member
From: Germany Registered: 2008-08-21 Posts: 683

JimmySeal wrote:

'cause they sure as heck don't all know that many kanji.

As in "they" don't know how to write them or how to read them? The former, probably yes. The latter.. I'm not so sure about this one. If you ask people they'll say anything from 500 to 6000 (in my experience).

mezbup Member
From: sausage lip Registered: 2008-09-18 Posts: 1681 Website

JimmySeal wrote:

mezbup wrote:

if you don't know how to read it it's going to break your reading flow when you look it up. Also, if a book is say 180 - 240 pages (roughly) that's a couple of kanji that are only used once PER page (on average).

You seem to have missed Womacks23's recent comment.  You said toward the beginning of this thread that one needs to know 2500-3000 kanji to read a book without a dictionary.  That's absurd.  If that were the case, Japanese would either (a) never read anything, or (b) carry around dictionaries around wherever they go, 'cause they sure as heck don't all know that many kanji.  We all know that neither (a) nor (b) is true.

Your trouble is that you think you need to look up every word that you don't know.  Show me someone who claims to know every word they see in their native language, and I will show you someone who is either lying or delusional.

I know what you mean, I don't always know every word I come across when i'm reading technical material or stuff that's very specific to a certain field and the word is jargon for that field. Other than that I can read a novel no problems and know all the words in it.

Denshi Jishos are awfully popular and if they didn't need them at all they wouldn't be all that popular. You don't see a hell of a lot of people in English speaking countries that carry a dictionary. Usually they have one at their house but don't often look in it. I've seen plenty carry them around. Give anyone a kanken test and see if they can nail the reading section 100% on levels higher than 3級... probably not? Time for the dictionary.

I think my point refers not to being able to read one book but being able to pick up book after book after book after book and have virtually no trouble. Even then you're missing my underlying point that I mention often, it's not kanji it's VOCAB.

No, you don't have to look up every word you don't know. No, you probably don't care about specialised vocab for certain fields, that's fairly normal. Yes, kanji outside the jouyou are used all the time in novels. Ergo, 2000 kanji only and you'll still need to reference things. That's if you're curious enough to look up words you don't know.

You know, this is an interesting debate. I'm going to ask as many Japanese as I can about it and see what they have to say. I do remember asking someone recently if they read a novel would there be things in there that they couldn't read and the answer was yes. So I don't think what I'm saying is totally UNTRUE which you seem to be saying it is.

Remember I'm talking about pretty much never needing a dictionary again. There's an insane difference between 95% and 99.5% and an equally insane one between 99.5% and 99.9%. I'm not talking about the first one, I'm talking somewhere inbetween the second and the third.

Fillanzea Member
From: New York, NY Registered: 2009-10-02 Posts: 534 Website

mezbup wrote:

There's a difference between decoding (stopping all the time and using the dictionary), reading with a dictionary (stopping seldom to look something up but other than that reading at good speed uninterrupted for periods) and fluent reading where you don't have to look anything up and can read uninterrupted for long periods at at time.

For the first one, 1000 - 1500 "kanji" is enough.
For the second 2000 - 2500 "kanji" is enough.
For the third 2500 - 3000 "kanji" is enough.
For never ever having to use a dictionary again it's gonna be more like 3000 - 7000 "kanji"

I get that this is a debate that will never be settled, probably, but:

I estimate that I knew about 700-800 kanji when I started reading Yoshimoto Banana's "Kitchen," Murakami's "Sputnik Sweetheart," and a number of light novels. This wasn't decoding, it was reading with occasional dictionary lookup.

That didn't mean I knew every single word. There were a ton that I didn't know. But it was enough that I could get the substance of what was happening and glide over the words that I didn't know. You don't have to know the meaning of every word to be able to read at a pretty high level of fluency. (You can try this in English -- take a text and delete, say, every seventh or every tenth content word).

I doubt I know more than 1200 or so kanji now, but I've read Soseki, Tanizaki, a little bit of Mishima, a lot of Murakami...

It's not about whether you have to use a dictionary to understand every single word. It's about whether you have to use a dictionary to understand the text as a whole.

mezbup Member
From: sausage lip Registered: 2008-09-18 Posts: 1681 Website

It's true you don't need to know every piece of information to understand the greater portion of a text but you have to admit for the stuff you skip over you don't know how to read that you aren't actually reading or understanding those parts, you're just skipping them and guessing. The brain is good at filling in blanks. I'm not saying you can't read, just saying I don't consider an incomplete understanding to be "fluent reading". Not getting a handful of words over the course of an entire novel is fine but not getting a few per page is not.

Fillanzea Member
From: New York, NY Registered: 2009-10-02 Posts: 534 Website

"Fine" for what, though? For whose purposes?

I'm not trying to take a test to prove how much I can understand (well, not since I passed JLPT 1, anyway). I'm reading for pleasure, to expose myself to Japanese, and to get a broad understanding of the world of Japanese literature.

If I can understand enough to follow the story, then I can read for pleasure. Language exposure is still extremely valuable if you can understand 90% or so -- it cements grammar structures and familiar vocabulary, and a few incidental contacts with an unfamiliar word can prime you for it when you do look it up. I have very little trouble remembering a new vocabulary word if it's already quasi-familiar to me from reading. Now, am I going to be able to do literary analysis at a high level if I have a lot of gaps in my understanding? Well, no, probably not, but high school teachers have no problem with making kids read Shakespeare anyway, and I can tell you that without a dictionary I'll understand more of Murakami than of Shakespeare.

It IS fine. It's fine for me. It's fine for my purposes. It's a heck of a lot better than if I never tried to read any of those books until I knew X number of kanji.

yudantaiteki Member
Registered: 2009-10-03 Posts: 3619

mezbup: But doing any reading at all is better than not doing anything because you're afraid that if you don't know 3000 kanji you're not going to be doing "fluent reading".  I am pretty sure that I don't know 3000 kanji but I rarely encounter characters I don't know in things I read in contemporary Japanese, either for fun or for research.  I would be surprised if I even know 2000, but I'm not sure.

Soseki's books were originally published with furigana over every kanji (as was the norm for prewar popular literature, before the Touyou Kanji list).

Also, BTW, there are around 800 kanji on the Jouyou list that do not have "official" kun readings, so there's a lot more than just 了 that only has one.  Of course once you get beyond the Jouyou list, any kanji can in theory have a kun reading -- The Kanjigen gives 4 kun-yomi for 了 -- おわ(る), おえ(る), さと(る), and つい(に).  The likelihood of seeing any of these readings outside of a 漢文 text, though, is very small.  I searched google and couldn't find anything, but I did find the horrifying 和了る, read あがる, as a Mah Jong term.

Last edited by yudantaiteki (2010 February 13, 8:00 am)

JimmySeal Member
From: Kyoto Registered: 2006-03-28 Posts: 2279

mezbup wrote:

Other than that I can read a novel no problems and know all the words in it.

Right.  So which are you, lying or delusional?

Denshi Jishos are awfully popular and if they didn't need them at all they wouldn't be all that popular.

I'd suppose part of it stems from their obsession with gatgetry, part of it from the fact that quality denshi jishos exist in Japan, and part of it from being lifelong English learners, though many could never use English to save their life.  It has nothing with being unable to "read fluently" in Japanese without a dictionary.

Give anyone a kanken test and see if they can nail the reading section 100% on levels higher than 3級... probably not? Time for the dictionary.

I have no idea what this means.

I think my point refers not to being able to read one book but being able to pick up book after book after book after book and have virtually no trouble.

I can pick up book after book after book with virtually no trouble.  Does that mean I know every word in each book? Hell no.  But I'm not having any trouble.

I'm going to ask as many Japanese as I can about it and see what they have to say. I do remember asking someone recently if they read a novel would there be things in there that they couldn't read and the answer was yes.

This is exactly what I said in my last post, so I don't know what your point is here.

Remember I'm talking about pretty much never needing a dictionary again.

Why would this be a goal for anyone?  Dictionaries exist so that you can refer to them from time to time.  Not so that you can memorize their contents and never use them again.

mezbup Member
From: sausage lip Registered: 2008-09-18 Posts: 1681 Website

You reckon you know 1200 but passed JLPT1? I don't doubt you passed JLPT1, sounds like your reading is at a pretty good level! I think perhaps you know more than you think you do numbers wise.

If you're happy, you're happy and there's no arguing with that. IMO you only learn kanji and vocab BY reading (after you're done with "study materials").

I'd still argue a few unknown words per page isn't fluent reading to a native level. It's at the reading with a dictionary level but just skipping things. It isn't a complete understanding. If it's 1 or 2 words per chapter that's not at all bad and you could definitely call that fluent reading. If it's 1 or 2 per book then you've reached a very high level.

I skip stuff when I read for pleasure but look up what I want to know or feel I need to know for comprehension purposes. When reading news, tech articles or web pages I usually look up every word I don't know for learning purposes. I understand there's a big difference, pleasure reading should be pleasurable.

Personally I'm aiming for a very high level of reading, I don't see the point in aiming low.
Why the hell not be able to read 3000 kanji? If you read all the time and Anki whatever you come across you'll get there eventually anyway. It'll probably take me the next 11 months to get to a level I'm OK with and 24 months to get to the level I'm aiming for.

Fillanzea Member
From: New York, NY Registered: 2009-10-02 Posts: 534 Website

I do think I have a kind of vague, ambient knowledge of more than 1200 kanji. And I did study kanji to pass JLPT 1, but I don't remember that much of what I studied.

I certainly agree that you only learn kanji and vocab by reading, and that's precisely why I wanted to start reading even when the number of kanji I'd studied was still fairly low. I tried for a while to study kanji, but what I came to realize was that it would tumble right out of my head unless I'd built up a familiarity with that kanji through reading.

I may be using "fluent" in a different sense than you are; I don't mean "native-level." I mean fluidly, smoothly. If I look at my advanced Japanese classes in college, a lot of my classmates essentially read word by word by word; they could know all the individual words but they had a hard time processing the text as sentences and paragraphs. And I think it's because I spent a ton of time outside class reading native materials, even when there were several words per page I didn't understand.

I do put a bunch of sentences in Anki. If I put everything I came across into Anki then either I would spend my whole life doing Anki, or I would have to severely cut down my pleasure reading. So I set some time aside to read for pleasure (usually on the subway) and some time to put sentences into Anki. I am certainly aiming for a level where I can read the major Meiji literature with a high level of fluency -- I just think that reading and vocabulary study are going to be more useful for that than kanji study.

Anyway, it seems like we're not actually that far apart. I didn't mean to get combative. I just resented the implication that I wasn't doing anything more than decoding.