Back

Kanji Compounds: It's over 9000! Seriously, though-over 70,000!?

#1
I'll be done with RTK1 by Sunday, and am realizing that there is a huge chunk of Japanese literacy that I haven't seen mentioned much - Kanji compounds. I just saw a number that mentioned there's over 70,000 of them. AAAAAHH!!! What the heck? I can't make time for that! If I had seen that number before I picked up RTK1, I bet I would have been too intimidated to even try to learn Japanese.

Are there lists of high frequency kanji compounds for the Japanese educational system, that we can go off of?

When a kanji pops up in Japanese writing, is it usually IN a compound?

Or are they rare? Will the sentence mining take care of it all?

What a scary number. @_@
Reply
#2
I'll be done learning the English alphabet by Sunday. There's a huge chunk that I didn't even seen much about: words. If we're just talking 4 letter words here, there's 26^4 = 456 976 possible combinations. AAHHH!

Sorry if I'm sounding really facetious here, but don't get too worried. There's numbers somewhere that says even in one's native language nobody knows that many words (or at least use on a regular basis).

Besides, once you reach a level of proficiency with Kanji, you'll be able to know what the word means anyway.
Reply
#3
Stolen from this page: http://www.askoxford.com/asktheexperts/f...umberwords

English has over 171,476 words in current use, and 47,156 obsolete words. To this may be added around 9,500 derivative words included as subentries.

How is this related? I think English words are composed of Latin and Greek 'roots' (among others). Things like "urban", "tion", "appear", etc. I think roots and kanji within compound words are more or less the same. How many of these words do you think you know of these 171,476? How many do you think you need to know? I imagine there's a parallel in Japanese. Don't worry about knowing all the words in the Japanese dictionary. I doubt Japanese do.

Edit: And I agree with Asriel, you'll probably get good enough to guess a lot of them. Let's say you just finished RTLR (Remembering The Latin Roots, a Heisig copycat) and came across a word called "urbanization". Well, crud. A new word. But wait, "urban"? That means city.... "-ization"? That means to form into... does that new word mean "To form into a city?" Japanese isn't always so nice, but still, I agree with Asriel.
Edited: 2009-04-29, 9:16 pm
Reply
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions! - Sign up here
JapanesePod101
#4
i suggest to start diving into these lists: http://forum.koohii.com/showthread.php?tid=2419
Reply
#5
I would be worried if there were less than that many words. It'd be difficult to express things, wouldn't it?
Anyway this is not a math forum, so let's forget the numbers Tongue
Reply
#6
But the first 10'000 are the hardest... Wink

Seriously, yes this a scary number. No, there is no need to panic.

Typically, such word counts include a great deal of redundancy a great deal of redundancy (ok old joke, couldn't resist). A lot of compounds are made by reusing familiar parts and can easily be understood once you know those parts.

For example, a word from my SRS: 原子力発電 (oh 5 kanji! scary!) but it's really an extremely easy word.

If you break down the parts 原子 nuclear 力 power 発電 generation of electricity, and the whole word means "nuclear power generation". All the kanji are common and pronounced according to their on reading. And if you know that one you can understand 水力発電 without even trying.

How many words do you need to learn? It's hard to give an exact number. I've done some frequency analysis on English ebooks. The Harry Potter series use about 10'000 different words (common word inflections are counted as only one word). The popular fantasy series "A Song of Ice and Fire" also gives me the same number (which interestingly seems to show that Harry Potter isn't really easier to read than an adult fantasy series). Both combined a total of about 14'000.

(Now my tests were using English books (I wouldn't know how to do the same in Japanese) but even if there are some differences, I think those numbers will mostly apply to Japanese too.)

I've then done some tests using the corpus from project Gutenberg. If you know the 5'000 most common English words from that list, you know 90% of the words in those texts. 10'000 -> 95%, 20'000 -> 98%, ...

So my conclusions are that I estimate that with about 20'000-30'000 words you can read most novels with only a few occasional words that you need to look up in the dictionary.
Reply
#7
Codexus Wrote:How many words do you need to learn? It's hard to give an exact number. I've done some frequency analysis on English ebooks. The Harry Potter series use about 10'000 different words (common word inflections are counted as only one word). The popular fantasy series "A Song of Ice and Fire" also gives me the same number (which interestingly seems to show that Harry Potter isn't really easier to read than an adult fantasy series). Both combined a total of about 14'000.

...

I've then done some tests using the corpus from project Gutenberg. If you know the 5'000 most common English words from that list, you know 90% of the words in those texts. 10'000 -> 95%, 20'000 -> 98%, ...

So my conclusions are that I estimate that with about 20'000-30'000 words you can read most novels with only a few occasional words that you need to look up in the dictionary.
How did you go about doing that? Very interesting results by the way and actually kind of reassuring to know that while, yes there are a huge number of words in a language, you really only need to know a relatively small subset of them for 90% of all situations.
Reply
#8
blackmacros Wrote:How did you go about doing that? Very interesting results by the way and actually kind of reassuring to know that while, yes there are a huge number of words in a language, you really only need to know a relatively small subset of them for 90% of all situations.
90% isn't as good as it sounds. You'll still have to look up several words per page.
Reply
#9
yukamina Wrote:90% isn't as good as it sounds. You'll still have to look up several words per page.
You're right at 90% it's still very difficult to read. That means that 1 word out of 10 is an unknown word. At least at the beginning of the book, the idea is that if we remember the new words reading gets much easier by the end.

As to how I did it, I used some ebooks I found on the net and I got the list created from the project Gutenberg corpus from wiktionary. Then I wrote a little python script to do the counting.

Anyway, these results shouldn't be considered very accurate, but I wanted to get a rough idea of the size of that mountain of Japanese I'm climbing Wink
Reply
#10
As an experiment I decided that while I was reading "A Princess of Mars" by Edgar Rice Burroughs I would look up every word that I wasn't confident that I knew what the definition was. I ended up looking up a lot of words on every page. For instance, the first chapter starts off in an Old West setting (in Arizona), and I had to look up carbine (a type of gun made for shooting from a horse), hogback (a type of ridge on the landscape), imprecation (curse; the book said that the Indians were "throwing imprecations" and at first I thought they were actually throwing something). That's vocabulary for you. I thought I heard that English has a much larger vocabulary than other languages. I'm not sure if that's true or not, but if it is, then maybe Japanese would actually be easier eventually.
Edited: 2009-04-30, 11:57 am
Reply
#11
Actually, I heard Japanese has a large vocabulary too. At least, that you have to learn more words to reach the same level in Japanese than you'd have to learn for, say, Spanish.
Reply
#12
English has a much better documented history than most languages... the OED is HUGE. And most statistics include the seemingly limitless technical and specialized vocabulary of scientific, engineering, and medical fields (which to be technical, are usually of latin and greek origin). I once saw a statistic (which alas I can no longer find) that the number of words in common use in English is only about twice that of other languages, and that is due to the large number of synonyms for common, everyday words. Part of the problem is that it is considered bad form to repeat a word, so the use of synonyms is encouraged, but also many of these words have their origins in different groups importing the same word from different languages, which is to be expected of any "universal" language.
Edited: 2009-04-30, 12:19 pm
Reply
#13
First of all, if you do 10,000 sentences with at least two new compounds per sentence, that's already almost a third of them. Plus, not all of those 70,000 compounds or whatever are used very often. AND once you learn enough readings, due to the power of meaning carried by each kanji, I'd be willing to bet you'll be able to guess the meaning and pronunciation of new compounds and words really easily. And, dude, the more reading you do, the easier it'll get. Don't sweat it.
Reply
#14
igordesu Wrote:First of all, if you do 10,000 sentences with at least two new compounds per sentence,
I wouldn't recommend putting too many new things in a single sentence as they become hard to remember correctly and you end up with much more revisions than you would have with twice that number of +1 sentences.
Reply
#15
I always thought the idea of sentences was to add one new 'thing' (word, grammar structure) in each one.
Reply
#16
I don't really understand this i+1 idea. I hear the term tossed around a lot, but what's it mean?
Reply
#17
As I understand it, it can mean studying something that has just one new element to learn. For example, one new vocabulary word in a sentence. That's what I meant in my post.

Or when talking about less specific things, it can also mean studying material that's just a step beyond your current level. The idea is to learn incrementally and not be overwhelmed by too much new information.
Reply
#18
I guess you're right. I've been wondering how many English words I know after seeing this thread, and many of them are related.

So none of you have used kanji compound frequency lists? Are there such a thing?

So I should just not worry about it, then, and just move on to sentences like I'd planned?
Reply
#19
Thunk Wrote:I guess you're right. I've been wondering how many English words I know after seeing this thread, and many of them are related.

So none of you have used kanji compound frequency lists? Are there such a thing?

So I should just not worry about it, then, and just move on to sentences like I'd planned?
You should probably search the forum a little more. Compound/vocabulary frequency lists have been the subject of numerous discussions and are the primary focus of almost all group vocab sentence projects. In fact just a few posts up someone linked to one of these frequency lists.
Reply
#20
Try http://forum.koohii.com/showthread.php?tid=2419 and (BROKEN LINK) (BROKEN LINK) http://forum.koohii.com/viewtopic.php?id=918.
Reply
#21
activeaero Wrote:You should probably search the forum a little more. Compound/vocabulary frequency lists have been the subject of numerous discussions and are the primary focus of almost all group vocab sentence projects. In fact just a few posts up someone linked to one of these frequency lists.
Thanks. I was already going to do those lists though after tae kim's grammar. I was looking for compound-specific lists, and had searched for kanji compounds on here before I posted this thread, and the only list that came up was one that had about 100 of them for a high school entrance exam. I wondered if there would be more like it.

But it sounds like I can learn them fine through the material so many here have contributed to iknow. That's what I'm gleaning from these posts, anyhow.

Thank you!
Reply
#22
Thunk Wrote:
activeaero Wrote:You should probably search the forum a little more. Compound/vocabulary frequency lists have been the subject of numerous discussions and are the primary focus of almost all group vocab sentence projects. In fact just a few posts up someone linked to one of these frequency lists.
Thanks. I was already going to do those lists though after tae kim's grammar. I was looking for compound-specific lists, and had searched for kanji compounds on here before I posted this thread, and the only list that came up was one that had about 100 of them for a high school entrance exam. I wondered if there would be more like it.

But it sounds like I can learn them fine through the material so many here have contributed to iknow. That's what I'm gleaning from these posts, anyhow.

Thank you!
Yes the iKnow/Smart.fm lists are arranged by frequency of use. The only issue is how the frequency of use list was created as people have varying opinions on what is "frequent". It is of most people's opinion, on this forum at least, that the original iKnow list was fairly poor. I use to argue against this but now I totally agree (apologies to whoever I got in that argument with in the past lol). Most consider the KO2001 list to be excellent so what some great people have done is take the KO2001 frequency list and use iKnow to create their own list based on that.
Reply
#23
activeaero, after your post, I finally read the description page of KO2001 (thank you Travis for the link). It contains just what I was looking for - frequency compounds. I'm all motivated again.

I have 150 kanji left in RTK1. Can't wait until I can get to all this good stuff! (after kana, and grammar, of course. : )
Reply
#24
yukamina Wrote:Actually, I heard Japanese has a large vocabulary too. At least, that you have to learn more words to reach the same level in Japanese than you'd have to learn for, say, Spanish.
Spanish is cake. I came into this with a false sense of confidence because of learning Spanish. I studied it 24/7 for 2 months in the States, and then after immersing myself by living in Spain, I was fluent in two months.

But now I know. If languages were assigned grades by level of tediousness/difficulty, then Spanish would be kindergarten, and Japanese would be college!

(Chinese would be medical school, because tonal languages do not roll off my tongue. They bounce around on it like a trampoline, hit the roof of my mouth, and tumble out in incoherent gurgles)
Edited: 2009-04-30, 8:35 pm
Reply
#25
from 合格が出来る一級「はじめに」

語彙は成人の母語話者の語彙量がやく50,000語と言われますから、日常的な話題にはほぼ対応できる語彙量と考えていいでしょう。

Quick translation: It is said that a native speaker would know around 50 000 words.

This doesn't seem impossible to me. More than kanji, I always thought that the ultimate difficulty in Japanese laid in the amount of words to be learned.
Edited: 2009-05-01, 7:49 am
Reply