Back

For anyone who likes meaningless character count statistics

#1
In this thread I seek to settle - once and for all - the age old question: How many hanzi do you have to know to read Chinese?

Just kidding. I'm here to throw out some approximate numbers that could have been obtained much more accurately and several thousand times faster using a computer program. But hopefully my plodding through it semi-manual style will provide some inspiration to someone.

So as of yesterday, I've finished reading all of the following (most of them multiple times) in Chinese, while harvesting every distinct character I could find:
Inuyasha, volumes 1-4
Suzuka (shonen manga), volumes 1-3
Narnia, book 3
Harry Potter, book 2
Two comics about 西遊記
A small handful of news articles, song lyrics, and other assorted text

And the approximate number of distinct characters I've found (not counting simplified/traditional variants) is: 3200

Next up is Narnia, book 5, followed by Harry Potter, book 5, and my hypothesis is that going through those two will bring my count up by about 300-400, which puts my estimate for close-to-complete coverage of characters in young adult fiction at about 3500 hanzi. I assume the average young adult reader of Chinese probably wouldn't know every single one of these characters by memory, so I suppose 3000 would be the typical number to approach works of this level.

I'll be back in several months to report on my findings.

Ciao.
Reply
#2
But two of them are kids' books? And some pretty light-on-text manga?


Not my idea of a book with comprehensive vocab...
Reply
#3
Zgarbas Wrote:But two of them are kids' books? And some pretty light-on-text manga?
*ahem*
JimmySeal Wrote:...puts my estimate for close-to-complete coverage of characters in young adult fiction at about 3500 hanzi.
I fully plan to read more advanced stuff soon enough, but one step at a time. But after reading the first chapter of Narnia 5 today, I think I'm still underestimating this language a bit and need to push that estimate up a bit more. Lots of horseriding vocab in this one.
Reply
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions! - Sign up here
JapanesePod101
#4
There's a difference between children's books and young adult fiction...
Reply
#5
Seems like a bit of a hazy distinction to me, especially in the HP case, but I'll grant you that you're probably right. Feel free to read my post substituting "children's" for "young adult." Smile
Edited: 2012-01-29, 2:12 pm
Reply
#6
This is very interesting... thanks for sharing JimmySeal.

I'm a little surprised that it only amounts to around 3500, I figured it'd be a bit higher.

Please keep us updated... I'll be interested to see how/if that figure jumps as you progress to more advanced things.

By the way, how did you go about calculating this?
Reply
#7
I'm not all that surprised by the number. I mean, even 古文觀止 only has 4438 unique characters, and you're talking about an anthology covering two thousand years of literature here. 3500 characters for some modern books written for kids/young adults doesn't seem all that low, to be honest.
Edited: 2012-02-02, 5:55 am
Reply
#8
JimmySeal, where do you buy your books/manga?
Reply
#9
JimmySeal Wrote:Seems like a bit of a hazy distinction to me, especially in the HP case, but I'll grant you that you're probably right. Feel free to read my post substituting "children's" for "young adult." Smile
1 2 and 3 are children's books in my opinion. Maybe by 4 and definitely 5 the books were being written to grow with their audience. So by the end we definitely have young adult fiction.
Edited: 2012-02-02, 7:18 am
Reply
#10
nadiatims Wrote:JimmySeal, where do you buy your books/manga?
http://www.books.com.tw/ is where I shop, as I'm making traditional Chinese my primary focus. Good service and speed and a large selection there, and they also have music and DVDs and whatnot.

Before I knew about that site, I bought HP5 from http://www.amazon.cn/, which appears to be mainland China and sells simplified Chinese books. No complaints there either.

zer0range Wrote:By the way, how did you go about calculating this?
Since Chinese readings are very consistent, unlike Japanese, I'm maintaining a 1-hanzi-per-card Anki deck. I start off a new card with 1 word or phrase that includes the character and quiz myself on the reading. If I encounter more words that I want to study that contain that character, I add them to the same card, and just quiz myself on the first word. So just to take one at random:

(Question)
政府

「我們是來質詢你行政措施、」

(Answer)
ㄓㄥˋ ㄈㄨˇ

ㄒㄧㄥˊ ㄓㄥˋ

And as long as I get the reading for 政府 right, I pass the card.

So my total number of cards is basically the number of characters I've covered so far. There are cases when I have more than 1 card per character if:
-The character has multiple readings, I give each reading its own card.
-If I want to drill simplified character->traditional character when they are very different (e.g. 灭 -> 滅), I'll make a card for practicing that
-I goof up and accidentally make two cards for one character
So it's not a precise count, but I don't have very many of those duplicate cards, so the count is pretty close.
Edited: 2012-02-02, 12:04 pm
Reply
#11
Update:
So I'm 8 pages away from finishing Narnia 5, but I've just passed the halfway point in terms of entering stuff into Anki. My current card count stands at 3363, which is just about 100 more than my count when I made that original post (I had subtracted 50 as a fudge factor). So this is right on par with my original estimate. On the other hand, I think after I'm done with HP5, I'll probably be at about 3700-3800 cards, which would exceed my original estimate. We'll see.
Reply
#12
I read these posts with interest and 4000 would be my estimate though I know next to nothing about Chinese...
Reply
#13
I'm confused; So 3363 is just individual hanzi characters?

What about compounds that actually use those characters; are you adding those to anki and tracking their number too?
Reply
#14
I gave a thorough explanation of how my deck works just three posts up. I generally have just one card per character, with one or more word/compound/phrase that contains that character. I drill myself on the reading of the card's main character (or the compound that contains if if it's in a compound) in the just the first item that's listed on the card. That's how I know that my card count is close to the number of distinct characters in my deck.
Reply
#15
Sorry i totally missed your explanation.

Ok cool, that makes sense.
Reply
#16
In case anyone is still tuning in...

I finished entering all my new Anki stuff for Narnia book 5 a while back, and when I was done, my card count stood at 3430, i.e. about 180 previously unknown characters overall for the book. This is about what I was expecting, and I guess the moral of the story here is that once you've got the characters in one or two books down pat, you can take on something at the same level/genre and encounter comparatively few unknown characters.

After that, I went through Narnia 3 for a final time, and picked up about 10 more characters that I'd missed or that didn't have their own Anki card yet.

I'm now about 1/3 of the way through HP 5. It was a slog at first, but the difficulty level has leveled out and it's starting to feel like smooth sailing at this point. After entering 7 chapters' worth of my notes into Anki (out of 39 chapters), my card count is up by 80, so I'm estimating this book probably has about 350 unknown characters for me altogether and will bring my total count to about 3750.
Reply
#17
Pretty sure I'm just talking to myself at this point, but I may as well follow through with what I started.

I'm 2 chapters away from finishing HP5, but I've just passed the halfway point in terms of entering stuff into Anki. That brought my card count to 3598, or in other words, about 160 new characters for 285 pages. Given that Narnia 5 brought me 180 new characters for 215 pages and the HP pages are more densely packed with text, this is a great sign. I'm predicting about 100 new characters for the second half of the book.

At this point reading this book has become pretty effortless. That's not to say that I understand every single thing (I don't), but it's nowhere close to the painfest that I had with HP2, and this one was at a higher reading level and 3 times longer. It's awesome to be at this point when a year ago, HP5 was looking like a herculean task.

I'll post again in a few weeks with the final tallies for this book.
Reply
#18
I wonder how much harder it is for a 12-year old Chinese person to read HP compared to a 12-year old Japanese, given the lack of furigana in Chinese. The 12-year old Chinese would not literally need to know 100% of the characters because they would probably be able to figure out some of the vocab from context, but still...
Edited: 2012-06-11, 12:56 am
Reply
#19
yudantaiteki Wrote:I wonder how much harder it is for a 12-year old Chinese person to read HP compared to a 12-year old Japanese, given the lack of furigana in Chinese. The 12-year old Chinese would not literally need to know 100% of the characters because they would probably be able to figure out some of the vocab from context, but still...
Yeah, that's definitely something that I've been curious about. One would think that the furigana-laden Japanese HP books would be a lot easier for native speakers than their Chinese counterparts. On the other hand, Chinese kids have a few things going for them reading-wise: Having no kun-yomi means they can learn more characters in less time than Japanese kids. Also, the characters' phonetic components are a lot more helpful in Chinese, so kids probably have a relatively easy time recognizing words they've learned aurally but haven't learned how to read yet. I think the latter point means that native Chinese speakers and native English speakers probably have a similar experience learning to read. They don't necessarily know how to pronounce every word, but they can make a ballpark estimate, and can often recognize words that they've heard before.

One of the biggest headaches for me for a while was the vast multitude of characters with the 口 radical on the left (as in 啦 咯 囉 嗲). These are mostly used for sound effects and exclamations and there are tons of them. My guess is that Chinese kids probably just gloss over these and pronounce them as though they don't have the 口 radical, unless they've specifically learned otherwise. I've finally gotten to the point where I don't run into that many new 口 characters anymore.
Reply
#20
So I've finally finished entering my HP5 cards into Anki and that brought my card count to 3690, or in other words I was able to get through the whole book with only about 260 new characters. This is pretty encouraging considering where I was a year ago, and it's almost 100 less than I was originally projecting.

I'm not sure if I'm going to keep this thread going any longer, but next up, I'm re-reading the 西遊記 and Inuyasha comics one last time, followed by re-reading Narnia 5, then some Grimm's fairytales, a light novel, and then a full novel by a Taiwanese author. Wish me luck!
Reply
#21
JimmySeal, keep on posting your progress because these are very insightful (at least for me)!

Can't remember if you said when you started HP5?

Oh by the way, Good Luck ! (which is far from being luck actually^^)

@+,
Dunki
Reply
#22
Hi Dunki, I'm glad you've found my posts insightful. I'll continue posting in that case. I started HP5 right around the middle of April and finished reading it on June 13. My goal was to finish reading it by the end of June (for a total of 2.5 months), so I met that goal. I've laid out a reading schedule for the rest of the year and I find that having a plan like that is useful for staying on track. I'll be rereading stuff I've already gone through until the middle of September, and then I'm aiming to read the Grimm book between September and the end of October, so I'll be back to post new findings around that time.
Reply
#23
Hi Jimmyseal,

If you don't mind, can you tell us how long did you study chinese and what were the main steps (stages) of your learning path? I am now more than curious on how you reach such a level and how you keep your commitment.

@+,
Dunki
Reply
#24
Hi Dunki,

Sorry for taking so long to get back to you and I hope I’m not rambling too much in my description below.

Over my lifetime, I've made a number of attempts at learning Chinese, but this was the first time I managed to follow through with it and reach "cruising altitude", so to speak. I've been at it for right about 2 years so far.

I should point out that I was quite proficient in reading Japanese before I started this attempt, so I was already familiar with a lot of characters and compounds. On the other hand, I think that going through volumes 1 and 2 of RtH would probably put you on relatively equal footing to where I was when I started.

The approach I’ve used is relatively simple:
1. Read books and comics
2. Write down unknown characters (including context where useful)
3. Look up the pronunciations for the unknown characters (I’ve used Wakan and zhongwen.com mostly)
4. Put them into Anki
5. Drill them in Anki
6. Continue reading

Naturally there are a lot of caveats here. So far, most of the reading I’ve done has been in either children’s novels, some of which I’ve already read in English, or children’s level comic books. Doing this enables me to ascertain a lot of words’ meanings entirely from context, without looking the meanings up (I almost never look words’ meanings up; my approach to language learning has been heavily influenced by the comments in these two pages: http://www.lingua.org.uk/voc.html http://www.lingua.org.uk/vocdb.html).

In order to prevent burnout from looking up too many unknown characters proportionate to too little forward reading progress, I would restrict the characters I wrote down and looked up to only those that I had seen more than once. This still amounted to quite a lot at first, and HP2 was quite a slog the first two times I went through it. The first time, I set it aside and moved onto something easier after reading 1/3. Then I read 2/3 of the book on the next try, and the third time, I got all the way to the end. I’ll also note that on the second read-through of HP2, the unknown character density was still very high, so that time again I focused on only the unknown characters that I noticed more than once that time around. Having cleared away a lot of unknown characters in my studies up to that point enabled me to narrow my focus onto repeated unknown characters that I hadn’t noticed the first time through.

Narnia 5, which I started right when I began this thread, was the first novel where I was able to pick up all of the unknown characters on the first time through. I was able to do the same with HP5, so I have a good feeling that I’m going to be able to do that from now on.

One more thing that I did to prevent burnout was to set daily limits: I write down my unknown characters on strips of paper that fit about to rows of text per column, so I would stop reading for the day when I reached 20 unknown characters, or reached the end of the chapter I was reading, whichever came first.

Although this method is pretty straightforward, that’s not to say that it is altogether easy. It has taken a lot of time and work, but the progress I’ve made has been very rewarding.

So the following are the principles I’ve followed to keep me on track and motivated. Most I’ve already mentioned above and am summarizing here:
- Start off with relatively easy material, and material where I can infer meaning from context, either because I’m familiar with the story and setting (as with HP and Narnia) or because there are pictures (in the case of the comics).
- Focus on more important characters (those that appear a lot) at first, then reread and pick up the ones I skipped.
- Reread books multiple times, to reinforce what I’ve learned, and to give my mind a break from constantly taking on new stuff. But don’t reread a book right after finishing it, as that can be repetitive and demotivating in that way
- Set daily limits (in my case, 20 unknown characters or one chapter, whichever comes first)
- Set short term reading goals and try to stick to them (e.g. finish reading book X by the end of August, figure out how many days/chapter I need to average in order to reach that goal)

All of the above is how I got to where I currently am, and I think I have enough under my belt to stick through it for the long haul. I hope this has been helpful to you.
Reply
#25
Thank you a lot JimmySeal !
There is a lot of useful and relevant information in your reply.
I am just digging into it right now.
The links you provided are excellent and I think it should fit my own views about the learning process that I consider.
I do not regret talking with you about this subject! :-)

@+,
Dunki
Reply