Learning Japanese fast - Why not use frequency lists for 80% coverage?

Index » RtK Volume 1

 
Reply #1 - 2009 June 19, 2:04 pm
scuda Member
From: カナダ Registered: 2008-11-02 Posts: 60

I was looking at two frequency lists, one for kanji and one for words. 

From wikipedia kanji frequency list http://forum.koohii.com/viewtopic.php?id=3367:
80% coverage from 555 kanji
85% coverage from 688 kanji
90% coverage from 874 kanji
95% coverage from 1214 kanji
96% coverage from 1327 kanji
97% coverage from 1473 kanji
98% coverage from 1684  kanji
99% coverage from 2058 kanji

From newspaper word frequency list http://ftp.monash.edu.au/pub/nihongo/ (wordfreq_ck):
80% coverage from 2164 words
85% coverage from 3364 words
90% coverage from 5578 words
95% coverage from 10958 words
98% coverage from 20748 words

These frequency lists may have some biases, however I think they are useful for thinking about our approach to learning Japanese quickly.  Main thing that stands out for me is that there are a lot more words to memorize than kanji.  Getting to 99% kanji is relatively easy compared to getting to 80% words.

It probably takes the same amount of time overall to learn a word as it takes to learn a kanji character.  Provided of course that the kana & kanji components of the word are already understood.

Even though it is so much easier to get to 99% kanji, is it worth doing so before studying words?  Going from 95% to 99% coverage in kanji is 844 kanjis.  That could have been 844 words that resulted in 68.75% coverage in words.  What is better, a 4% gain in kanji or a 68.75% gain in words?

Maybe it would be a lot more efficient to switch from kanjis to studying words after 80% or 90% coverage in kanjis.  Then when we get to 80% or 90% words, we can go back and pick up the more uncommon kanjis.

It seems that to quickly get to 80% coverage of kanji and words, it would only require memorizing 555 of the most common kanjis and 2164 of the most common words.  At a rate of 30 items/day, would result in 80% coverage of the Japanese language in a mere 90 days, or 3 months.

Conversely, getting to 90% coverage would be 874 kanji and 5578 words, and take another 125 days, or 4 months.

Thoughts?


BTW, does anyone know if there is a stripped down RTK list that avoids uncommon kanjis?  RTK lite is aimed at JLPT, and I don't think JLPT is based on frequencies either.

Reply #2 - 2009 June 19, 2:20 pm
Aijin Member
From: California Registered: 2009-05-29 Posts: 648

Studying kanji before studying any of the words that use those kanji seems a little odd to me. As you study characters you should simultaneously study the compounds that utilize various readings of that character. If you learn 10 words per character that use that character, by the time you're at 99% kanji you reach 98% vocabulary as well (though some characters don't have many words that use them)

Avoiding uncommon kanji seems a bit unwise to me also. There are many common and important words that use kanji that are otherwise uncommon.

Reply #3 - 2009 June 19, 2:26 pm
ファブリス Administrator
From: Belgium Registered: 2006-06-14 Posts: 4021 Website

How about getting a list of the most frequent compounds for every day japanese, and then from that, the most useful kanji to learn, in order? (edit: result list would be very similar... I guess)

Advertising (register and sign in to hide this)
JapanesePod101 Sponsor
 
Reply #4 - 2009 June 19, 2:32 pm
scuda Member
From: カナダ Registered: 2008-11-02 Posts: 60

Most people here seem to attempt to do most of RTK first before studying the readings, compounds, and words.  I personally suspect it is faster to learn a fundamental set of things before building on that, such as learning compounds and readings.  I think learning multiple things at the same time slows things down.

Also, I realize that for an 80% or 90% kanji list, it would still need to be cross-checked against an 80% or 90% word list to find out if there are some kanjis that need to be included before being ready to study the relevant word list.

EDIT: Or what the administrator said smile

Last edited by scuda (2009 June 19, 2:41 pm)

Reply #5 - 2009 June 19, 2:38 pm
erlog Member
From: Japan Registered: 2007-01-25 Posts: 633

This sounds good on paper, but you are going to be unsatisfied with it. Those frequency lists don't mean you'll be able to understand 80% of everything. It's actually more like if there's 10 kanji in a sentence then you'll understand 8/10 of them. Or if there's 10 words in a sentence then you'll only be able to understand 8 of them.

You're confusing frequency with importance for understanding meaning. A kanji's frequency of appearing has little do with how important it is for understanding. In fact, you could make the case that frequency and importance have an inverse relationship. The less frequent kanji are probably more important because they are only used when they are necessary. The same goes for words.

Frequency and importance are always taken into account in all language learning materials. Most materials do go in some order of frequency or importance, and so does the JLPT2. They all tend to go roughly by the 常用 kanji list broken down by grade level as written by the Japanese Ministry of Education.

You can continue to fool yourself into thinking that if you just do 600 kanji then bang zoom wow you'll be able to understand so much, but I'm hear to tell you that you are dead wrong. I know about 1200-1500 kanji. I know about 3000 words.

It's damn tough for me to understand anything for anyone written above a 4th grade level. I can understand all sorts of unimportant mundane details about when and where stuff is happening, but a lot of times I haven't the foggiest the what, how, or why.

Look at the words on that list, and think about how important they are for you in imparting meaning in a sentence in English. How important is the word Monday in this sentence: Monday, my father died. -or- Yesterday, I started choking while eating an apple.

Those mundane words are the most common like Monday, my, father, apple, and eating. The less common words are where the true meaning of the sentence lies.

Last edited by erlog (2009 June 19, 2:44 pm)

Reply #6 - 2009 June 19, 2:39 pm
Codexus Member
From: Switzerland Registered: 2007-11-27 Posts: 721

scuda wrote:

It seems that to quickly get to 80% coverage of kanji and words, it would only require memorizing 555 of the most common kanjis and 2164 of the most common words.  At a rate of 30 items/day, would result in 80% coverage of the Japanese language in a mere 90 days, or 3 months.

80% coverage sounds nice in theory but think of it that way. For every 5 words there is one that you don't understand.

Frequency lists are a nice starting point and great when you don't know what to study next. But ultimately there is no shortcut. It's going to take time to learn those 20748 words and then some more. big_smile

Reply #7 - 2009 June 19, 2:42 pm
vosmiura Member
From: SF Bay Area Registered: 2006-08-24 Posts: 1085

Using RTK, it is good to do a lot of kanji while you're in the groove.  That's one reason for doing about 2000 upfront with RTK.  Another reason is that as Aijin mentioned, even some rarer kanji still get used in fairly common words.  It's hard to tell upfront what kanji you're going to need.

However if you don't have the patience to finish all of RTK first, I think doing RTK Lite (or other frequency based list) first, and then learning kanji by RTK method as you encounter them can also be a valid method.  Doing RTK Lite would probably be enough to learn most of the primitives, and then you can apply that to other kanji in any order.

Edit: And yeah, knowing 80% most frequent words is still just the tip of the iceberg - and generally means little to no comprehension of native Japanese speech & writing.

Last edited by vosmiura (2009 June 19, 2:45 pm)

Reply #8 - 2009 June 19, 2:48 pm
scuda Member
From: カナダ Registered: 2008-11-02 Posts: 60

erlog wrote:

You're confusing frequency with importance for understanding meaning. A kanji's frequency of appearing has little do with how important it is for understanding. In fact, you could make the case that frequency and importance have an inverse relationship. The less frequent kanji are probably more important because they are only used when they are necessary. The same goes for words.

Frequency and importance are always taken into account in all language learning materials. Most materials do go in some order of frequency or importance, and so does the JLPT2. They all tend to go roughly by the 常用 kanji list broken down by grade level as written by the Japanese Ministry of Education.

Good point about the inverse relationship, it is probably true in a lot of cases.

I assumed JLPT2 would have a poor kanji list because of its standardized and conventional nature.  Maybe I should take another look, perhaps RevTK lite would be equivalent to basically 93% coverage or so.

Last edited by scuda (2009 June 19, 2:52 pm)

Reply #9 - 2009 June 19, 3:07 pm
drivers99 Member
From: Alamogordo NM Registered: 2009-03-31 Posts: 141

Statistics error... knowing 80% of the words that appear in a large collection of newspapers does NOT mean that you only know know 4 out of 5 words per sentence, because the words within that list are NOT randomly used.  I don't know what the figures would be though, and my google-fu isn't finding it.

Reply #10 - 2009 June 19, 3:23 pm
erlog Member
From: Japan Registered: 2007-01-25 Posts: 633

scuda wrote:

I assumed JLPT2 would have a poor kanji list because of its standardized and conventional nature.  Maybe I should take another look, perhaps RevTK lite would be equivalent to basically 93% coverage or so.

I thought the same thing when I first started studying Japanese. I was highly skeptical of the 常用 kanji list because it was created in the days before computers. How could they have possible known which kanji to include and which not to?

The answer is that they were smart. In the research I did on the list(it was quite exhaustive), with the exception of a few kanji here and there the 常用 kanji list pretty much lines precisely up with the frequency data I was able to generate. All of the places it deviates are places that you could easily say, "Yeah, but this kanji is more important despite being less frequent."

drivers99 wrote:

Statistics error... knowing 80% of the words that appear in a large collection of newspapers does NOT mean that you only know know 4 out of 5 words per sentence, because the words within that list are NOT randomly used.  I don't know what the figures would be though, and my google-fu isn't finding it.

I think everyone in this discussion knows this. We're just making a point about how recognizing 80% of all the words doesn't mean you'll get 80% of the meaning. We're actually making this point for you, but in a different way.

Last edited by erlog (2009 June 19, 3:24 pm)

Reply #11 - 2009 June 19, 3:32 pm
ahibba Member
Registered: 2008-09-04 Posts: 528 Website

Aijin wrote:

Studying kanji before studying any of the words that use those kanji seems a little odd to me.

When you started learning English, didn't you learn the alphabet first before any of the words that use those letters? or did you learn it in this way: A apple B banana. How can you you read the word "apple" if you did not study letters p, l, e yet?

Reply #12 - 2009 June 19, 3:50 pm
vosmiura Member
From: SF Bay Area Registered: 2006-08-24 Posts: 1085

Well, normally as a native you learn to speak before you learn to read and write.

However the way Japanese kids learn kanji is at a relatively leisurely pace, and enforced over years of reading & writing exposure.  Adult foreign learners often have neither the patience nor the exposure to learn the same way and reach the same level.

In the context of RTK some of the ideas are "divide and conquer" as well as "strike the iron while it's hot".  That's why we learn to write & recognize lots of kanji first in one go, and then start learning words after.  It's not like it's very long - I mean 3 months (which seems average for RTK1) is such a short time on the long road to learning Japanese.

Last edited by vosmiura (2009 June 19, 4:04 pm)

Reply #13 - 2009 June 19, 4:28 pm
yukkuri_kame Member
From: Florida US Registered: 2008-05-30 Posts: 185

erlog wrote:

I thought the same thing when I first started studying Japanese. I was highly skeptical of the 常用 kanji list because it was created in the days before computers. How could they have possible known which kanji to include and which not to?

The answer is that they were smart. In the research I did on the list(it was quite exhaustive), with the exception of a few kanji here and there the 常用 kanji list pretty much lines precisely up with the frequency data I was able to generate. All of the places it deviates are places that you could easily say, "Yeah, but this kanji is more important despite being less frequent."

Another possible answer is that the 常用 is a bit of a self-fulfilling prophecy.  If those are the characters that the Japanese learn in school, then they will be the ones most commonly used in modern Japanese.  Also, consider that, while Japanese is evolving, many of the new words are kana words, and I doubt many new kanji are being created.

Reply #14 - 2009 June 19, 4:58 pm
Aijin Member
From: California Registered: 2009-05-29 Posts: 648

ahibba wrote:

Aijin wrote:

Studying kanji before studying any of the words that use those kanji seems a little odd to me.

When you started learning English, didn't you learn the alphabet first before any of the words that use those letters? or did you learn it in this way: A apple B banana. How can you you read the word "apple" if you did not study letters p, l, e yet?

Actually I knew a handful of English words before ever learning the alphabet tongue That happens when you're exposed to common words in media before you've even begun formal education on the language though.

I don't know if the analogy is fair. I'd compare the roman alphabet more to hiragana/katakana, which of course has to be learned before it can be used. But for kanji, it's hard for me to imagine learning only how to write the characters first, and then having to go back and learn all the readings, and then on top of that having to learn all the vocabulary that uses those various readings.
To me it makes more sense to do it at the same time, so that you can instantly use what you learn. If you only learn how to write a character and it's rough English meaning, it has no practical application until you learn the readings and words that use it. But, I think that probably both ways of learning have their advantages and in the end equal the same amount of time. So whatever works best for people is great smile

I would be interested in hearing more about how you guys feel that style helps you/is better than learning everything at once though! Pros/cons would be great.

Reply #15 - 2009 June 19, 5:34 pm
jokoto Member
Registered: 2007-03-12 Posts: 63

There are statistical "which word is used the most in this language" lists for lots of languages available in internet. Unfortunately I do not have the link to any of these pages available yet.

scuda wrote:

I assumed JLPT2 would have a poor kanji list because of its standardized and conventional nature.  Maybe I should take another look, perhaps RevTK lite would be equivalent to basically 93% coverage or so.

I did learn the JLPT 2 kanjis from the RevTK lite list and it is a great start. After finishing it, you can start immediately with any beginner japanese textbook while learning the remaining JLPT 1 kanjis if needed. RevTK takes around half the time.

Reply #16 - 2009 June 19, 5:35 pm
ahibba Member
Registered: 2008-09-04 Posts: 528 Website

Aijin wrote:

for kanji, it's hard for me to imagine learning only how to write the characters first, and then having to go back and learn all the readings, and then on top of that having to learn all the vocabulary that uses those various readings.
To me it makes more sense to do it at the same time, so that you can instantly use what you learn. If you only learn how to write a character and it's rough English meaning, it has no practical application until you learn the readings and words that use it.

James Heisig wrote:

"STUDYING THE KANJI

The big question is, of course, how to train one’s mind to read and write Japanese. There are those who simplify matters by deciding that there is no need for persons educated outside of the Japanese school system to bother learning how to write  the language. If you can read, you will remember how to write a few hundred of the kanji along the way and you can leave the rest to computers to handle for you. Or so the argument goes. It has the full support of most Japanese who have never met a Western-educated individual who can write the kanji with the same fluency as they and have somehow decided that, without the benefit of an education in writing that begins at the pre-school level and goes all the way up to the last year of high school, there is no way they ever could. This is not only the case for ordinary readers of Japanese but also for the great masses of scholars of Japanese scholarship in the
West. The hiragana and katakana, and perhaps a third-grade level of writing—but more than that is unreasonable to expect.

If you accept the argument, you are solidly in the majority camp. You would also be as wrong as they are. To begin with, there is no reason you cannot learn to write the kanji as fluently as you read them, and in a fraction of the time it takes to do it through the Japanese school system. What it more, without the ability to write, you are forever crippled, or at least limited to walking with the crutch of an electronic dictionary or computer. Finally, by learning to write you have helped to internationalize the fullness of the Japanese language beyond the present-day limits.

All of this is common sense to the Korean and Chinese who come to Japan to learn the language. The reason Westerners tend to dismiss it is their fear of not being able to learn to write, or at least not without devoting long years to the task. As I said, this fear is unfounded.

The key to learning to write is to forget the way the Japanese learn and pay attention instead to the way the Chinese learn Japanese, and then adapt it to the West. Consider the following diagram.

http://img14.imageshack.us/img14/8555/flowerjai.png

http://img14.imageshack.us/img14/4966/flower2l.png


The conclusion should be obvious:  If you want to learn to read and write all the general-use kanji, you should study them separately.

Which one do you start with, the reading of the writing? You might be surprised, but the answer is—the writing. There are two reasons. First, by doing so you end up in basically the same position as the Chinese coming to the study of Japanese kanji: you
know what they mean and how to write them
, but you still have to learn how to pronounce them. Second, the writing is a rational system that can be learned by principles, whereas the readings require a great deal of brute memory."


Aijin wrote:

I don't know if the analogy is fair. I'd compare the roman alphabet more to hiragana/katakana, which of course has to be learned before it can be used.

oregum wrote:

I think of RtK as a very complicated form of spelling. Imagine not knowing the alphabet of completely foreign language that has 3007 common words. Instead of trying to memorize each word as a picture (kanji), you break it up into letters (primitives). By knowing all the primitives you can build words. Like any language, the letters must be put in the correct order for the word (kanji) to make sense.

Last edited by ahibba (2009 June 19, 5:37 pm)

Reply #17 - 2009 June 19, 5:38 pm
kazelee Rater Mode
From: ohlrite Registered: 2008-06-18 Posts: 2132 Website

Aijin wrote:

To me it makes more sense to do it at the same time, so that you can instantly use what you learn. If you only learn how to write a character and it's rough English meaning, it has no practical application until you learn the readings and words that use it. But, I think that probably both ways of learning have their advantages and in the end equal the same amount of time. So whatever works best for people is great smile

I would be interested in hearing more about how you guys feel that style helps you/is better than learning everything at once though! Pros/cons would be great.

For me, the bad thing about learning vocab along with reading and the kanji is that there is a lot of nothing. The kanji mean nothing. The readings mean nothing. Some would say context will tell you what you need to know, but if you're just starting then context means nothing, so you drudge through the dictionary.  Once you've finally managed to memorize all these elements together you then have to go back and learn context.

By using RTK or the Movie Method then mining sentences, there still a lot of drudging if you can't fully understand the context, however, you're learning context as you learn vocabulary, and readings. The work's about equal as you say, but one seems less stressful than the other.

If there were an efficient way to learn them all at once, though, I'm sure people would be doing so.

Reply #18 - 2009 June 19, 6:41 pm
drivers99 Member
From: Alamogordo NM Registered: 2009-03-31 Posts: 141

erlog wrote:

drivers99 wrote:

Statistics error... knowing 80% of the words that appear in a large collection of newspapers does NOT mean that you only know know 4 out of 5 words per sentence, because the words within that list are NOT randomly used.  I don't know what the figures would be though, and my google-fu isn't finding it.

I think everyone in this discussion knows this. We're just making a point about how recognizing 80% of all the words doesn't mean you'll get 80% of the meaning. We're actually making this point for you, but in a different way.

Edit: My point is not to argue with you, but I want to leave this post up because I think it has a good point about how much you can get out of knowing just the most common words.

I take your point though, you're saying that the more rare a word is the more important it would be to the meaning of the sentence.  But those words are going to be few and far between, maybe one per paragraph on average (and less than that after the terms related to the subject matter have been established). In that case you would either look it up or get a general idea through context.  So, I think studying vocabulary by frequency is a great idea.  (I wouldn't stop at 80% of the usual use kanji though... I mean, that's just 400 kanji... just do it.)

Just to prove my point (in English anyway), I just put the above paragraph into this http://www.oup.com/elt/catalogue/teache … iler?cc=gb (after converting all contractions into their component words, because it doesn't work with contractions for some reason) and the only words it didn't have in the list of Oxford 3000 English words were "kanji" and "paragraph."  Also, the word "frequency" was from a specialist list. I didn't intentionally write it with that in mind; it was an afterthough.

Last edited by drivers99 (2009 June 19, 6:50 pm)

Reply #19 - 2009 June 19, 7:44 pm
scuda Member
From: カナダ Registered: 2008-11-02 Posts: 60

drivers99 wrote:

Statistics error... knowing 80% of the words that appear in a large collection of newspapers does NOT mean that you only know know 4 out of 5 words per sentence, because the words within that list are NOT randomly used.  I don't know what the figures would be though, and my google-fu isn't finding it.

drivers99 wrote:

I take your point though, you're saying that the more rare a word is the more important it would be to the meaning of the sentence.  But those words are going to be few and far between, maybe one per paragraph on average (and less than that after the terms related to the subject matter have been established). In that case you would either look it up or get a general idea through context.  So, I think studying vocabulary by frequency is a great idea.  (I wouldn't stop at 80% of the usual use kanji though... I mean, that's just 400 kanji... just do it.)

It isn't clear if this is actually favorable or detrimental to understanding the target material.  But, you seem to be suggesting it actually is favorable?  That knowing 80% of the words might help to understand 90% of the material?

I wonder if there is a threshold of word coverage that gives just enough understanding to pick up all the new words entirely through context, without needing to rely on a dictionary.  As a native english speaker, I picked up the vast majority of my vocabulary via reading many books as I grew up, and I never used the dictionary to learn words.

Last edited by scuda (2009 June 19, 7:50 pm)

Reply #20 - 2009 June 19, 8:11 pm
uberstuber Member
Registered: 2007-03-27 Posts: 238

Aijin wrote:

I don't know if the analogy is fair. I'd compare the roman alphabet more to hiragana/katakana, which of course has to be learned before it can be used. But for kanji, it's hard for me to imagine learning only how to write the characters first, and then having to go back and learn all the readings, and then on top of that having to learn all the vocabulary that uses those various readings.

I learned about 200 characters the 'standard' way, by learning the writing, meaning, and readings at the same time. I found it very difficult to do this, and my retention rate was horrid. My theory is that the 'chunk size' of learning everything at once is too large. For me, there was too much information to connect to one character at once. Only learning the writing and a keyword has a smaller, more digestible 'chunk size.'

Incorporating the reading is also more difficult because its harder to make a connection to a seemingly arbitrary reading. Some try to incorporate a reading into their writing mnemonic, but I found I would have to stretch my stories too far to be rememberable.

I never actually went through and learned readings individually; I couldn't do it. After finishing RTK, I went AJATT style and learned readings through new vocabulary. This worked great for me.

To me it makes more sense to do it at the same time, so that you can instantly use what you learn. If you only learn how to write a character and it's rough English meaning, it has no practical application until you learn the readings and words that use it.

I fail to see how learning an arbitrary reading lets you use a character immediately, unless you already knew words using that reading. I think learning new vocab after knowing characters is more efficient, because you have more things to connect the new vocab too while learning.

Learning a character and its rough meaning doesn't have a direct practical application, but it provides major benefits. For me, it removed the feeling of 'omg what are all these random scribbles I'll never be able to learn this stuff' and replaced it with 'wow, I recognize most of these; I can actually become literate.' I likely would have stopped studying Japanese had I not stumbled upon RTK.
Also, going through RTK allows you to use real Japanese text as study material basically as soon as you are finished. You won't understand much or be able to actually read it, but you can study with something you enjoy, rather than being stuck with boring graded readers for a year or two while you learn kanji the standard way.
Also, when learning new words, already knowing the general meaning of kanji allows you to see the logic behind many compounds, and gives another mental hook to remember the word.

But, I think that probably both ways of learning have their advantages and in the end equal the same amount of time. So whatever works best for people is great smile

I would be interested in hearing more about how you guys feel that style helps you/is better than learning everything at once though! Pros/cons would be great.

Basically, I see it like this:
RTK is a 3 month investment of not getting any 'practical' use. In return, you get a huge stepping stone towards literacy. When I decided to go through RTK, I had no need for output, only a desire to understand. I also had stopped taking Japanese classes, and wasn't bound by a specific curriculum.
Standard study gets you through the kanji slower, but nets 'practical' results right away. For those who want immediate results, or need to produce output for class/work/etc, this may be better.

Reply #21 - 2009 June 19, 8:33 pm
mentat_kgs Member
From: Brasil Registered: 2008-04-18 Posts: 1671 Website

@ahibba
I don't know your source but it looks far from reliable. If you cite from where you took it it might help.

First, ideogram is not a very good word for it. They are actually logograms.  They are morphologic elements of vocabulary so Aijin has a very good reason to find learning them before counter effective.

But doing the RTK before, with a SRS is good for a strategic reason. It frees you from "beginner content". It gives you a chance to learn from interesting content, right from the beginning.

My fear when I first started learning Japanese, long before knowing about RTK was to stick to dumbed down text for ages before finding something nice to read.
This is sometimes called a bottom->up approach.

You could also start by only listening and speaking. A top->down approach. But reading is such an important skill that I didn't want to delay its development.

RTK, as proposed by Heisig and popularized by Katzumoto gave me confidence that I could enjoy interesting content just after 3 months. This meet-in-the-middle approach works very well. It's main advantage is that you learn the basis of Japanese together with "high level japanese". This was vital for keeping me interested and motivated in my jorney.

Reply #22 - 2009 June 19, 8:52 pm
eroichigo Member
Registered: 2009-03-07 Posts: 20

drivers99 wrote:

erlog wrote:

drivers99 wrote:

Statistics error... knowing 80% of the words that appear in a large collection of newspapers does NOT mean that you only know know 4 out of 5 words per sentence, because the words within that list are NOT randomly used.  I don't know what the figures would be though, and my google-fu isn't finding it.

I think everyone in this discussion knows this. We're just making a point about how recognizing 80% of all the words doesn't mean you'll get 80% of the meaning. We're actually making this point for you, but in a different way.

Edit: My point is not to argue with you, but I want to leave this post up because I think it has a good point about how much you can get out of knowing just the most common words.

I take your point though, you're saying that the more rare a word is the more important it would be to the meaning of the sentence.  But those words are going to be few and far between, maybe one per paragraph on average (and less than that after the terms related to the subject matter have been established). In that case you would either look it up or get a general idea through context.  So, I think studying vocabulary by frequency is a great idea.  (I wouldn't stop at 80% of the usual use kanji though... I mean, that's just 400 kanji... just do it.)

Just to prove my point (in English anyway), I just put the above paragraph into this http://www.oup.com/elt/catalogue/teache … iler?cc=gb (after converting all contractions into their component words, because it doesn't work with contractions for some reason) and the only words it didn't have in the list of Oxford 3000 English words were "kanji" and "paragraph."  Also, the word "frequency" was from a specialist list. I didn't intentionally write it with that in mind; it was an afterthough.

That 3000 thing was really interesting. I wonder if anyone here who knows how to program could do something like that but only sample dialogs. For instance, just the parts where people are speaking to each other in novels or from movie scripts/television. I think such a list would awesome for those of us trying to understand most japanese entertainment.

Reply #23 - 2009 June 19, 9:12 pm
Jarvik7 Member
From: 名古屋 Registered: 2007-03-05 Posts: 3946

When talking about the word frequency list that is in circulation, keep in mind it is from a financial newspaper and thus the list has no relation to daily-use Japanese or even to Japanese that you'll find on tv and in books. Frequency lists only help you with the material that the list was generated from. In the case of this list, the sample was much too specific.

A wikipedia-based word frequency list would be much more useful since it covers (or attempts to) every subject there is, but it would still somewhat suffer from being Japanese written in an academic/formal style.

I encourage Shang to use ChaSen instead of his own tool and re-run his analysis for more accurate information, but the list as-is is very useful in comparison to the financial one.

Last edited by Jarvik7 (2009 June 19, 9:14 pm)

Reply #24 - 2009 June 19, 10:30 pm
scuda Member
From: カナダ Registered: 2008-11-02 Posts: 60

Jarvik7 wrote:

When talking about the word frequency list that is in circulation, keep in mind it is from a financial newspaper and thus the list has no relation to daily-use Japanese or even to Japanese that you'll find on tv and in books. Frequency lists only help you with the material that the list was generated from. In the case of this list, the sample was much too specific.

You might be interested to know that there are a variety of word frequency lists at http://ftp.monash.edu.au/pub/nihongo/

There's one that was done on novels and another that was done on blogs, etc.  Those are probably a lot more useful.

Reply #25 - 2009 June 19, 10:35 pm
Jarvik7 Member
From: 名古屋 Registered: 2007-03-05 Posts: 3946

Cool, I never knew about the second two of the list. The first one is the most common and is the one I was talking about.

This one seems to be the best, but it's not listed on Breen's site: http://corpus.leeds.ac.uk/list.html