Back

How far 100 words will get you

#1
http://www.cateeslanguageworld.com/pimsl...0words.php

Do we have a list like that somewhere for the most common 100 or 500 words in Japanese ?

Of course the list depends on the material used for the statistics, but I would say a list of most common words to get you by in daily life and every day conversation.

PS: one possiblity would be the list of vocabulary from "Linkword Japanese Level I".

...

If we could agree to such a list it would be great to have the most common vocabulary and the most common grammar patterns as sets of flashcards on this site. These could be part of a series, so learners like me who dont know which step to go next don't have to wrorry and just continue reviewin ' Smile (and of course integrate that by watching DVDs and listening and trying to read a little)
Reply
#2
A somewhat unrelated thought - does anyone know if there is an 'official' list of words that students need to know for various levels of Nouryoku Shiken? It is usually said that Level 2 requires 7,000 words but none of the lists I have seen so far contains as many - including the one that has been posted elsewhere on this forum...
Reply
#3
The number 7,000 is probably a combination of all the words you need to know for 4,3 and 2; Not an additional 7,000 just for level two. But to be honest, Im guessing here. I know that the 10,000 or so you need for level one is measured that way.

Fabrice: I don't know if such a list exists, but my guess would be that the 100 most common words would be included on level 4 of the proficiency test. Again just a guess, but it may be a good place to start.
Reply
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions! - Sign up here
JapanesePod101
#4
Hyland1 presumes correctly, but it's 6000, not 7000. The numbers for the four levels are:

800
1500 (+700, 88% increase)
6000 (+4500, 300% increase)
10000 (+4000, 67% increase)
Reply
#5
I came across this some time ago:
(taken from http://ftp.monash.edu.au/pub/nihongo/00I...ml#oth_fil)
Jim Breen Wrote:In 1998 Alexandre Girardi produced a word-frequency list based on 4 years of the Mainichi Shimbun. It contains about 300,000 words. Another version, which Charles Kelly at Aichi Institute of Technology tidied up, is available.
I also know the WaKan project played around with the word frequency as well. That program may provide the quickest way to generate such a list.
Reply
#6
I actually have an excel file with levels 1, 2, 3, and 4 vocabulary and kanji for levels 1, 2, and 3. If anyone is interested, I can e-mail it to them. This list is not "official" but it was compiled by a language school based on tests conducted since 1994. The list is pretty comprehensive. I have used these lists to create Twinkle files for vocabulary review and learning. They are very helpful.
Reply
#7
ファブリス Wrote:http://www.cateeslanguageworld.com/pimsl...0words.php
If we could agree to such a list it would be great to have the most common vocabulary and the most common grammar patterns as sets of flashcards on this site. These could be part of a series, so learners like me who dont know which step to go next don't have to wrorry and just continue reviewin ' Smile (and of course integrate that by watching DVDs and listening and trying to read a little)
Hi Fab,
Would you consider doing it by JLPT level, maybe splitting it up every 100 words, or would that be too much work?

Also, as far as grammar goes, I'm slowly building a spreadsheet for "Japanese for Everyone", which covers all of the grammar in the book. I'm sure there are a lot of typos, but you are welcome to it. I just finished transcribing ch 12 (of 27).
Thanks,
Leo
Reply
#8
The JLPT is another thing from what I had in mind, but if the review functionality is the same, then it is just a matter of providing me with the data. That said, these are just ideas at the moment, no plans yet.

Regarding "Japanese for Everyone", or other sourced material for grammar flashcards, I will need express authorisation from the authors if this happens.
Reply
#9
After reading over this post, I decided to try out on a handheld video game what Catee did to MacBeth. The results are here: http://chris-fritz.blogspot.com/2006/11/...t-you.html

Out of 45,314 kanji/kanji words used in the game (2,795 unique), ignoring all kana, the top kanji/words used are:

今日 (1.8% of game kanji)
仕事 (1.5%) [It's a farming game where you hire harvest sprites to work for you.]
食 (1.5%) [Crops/produce may be given to people to eat, so...]
見 (1.4%)
思 (1.2%)
行 (1.2%)
何 (1.2%)
来 (1.2%)
楽 (1.0%)
手伝 (1.0%) [Those sprites help out a lot.]

Obviously such a ...what's the word I'm looking for... A game focused on a specific genre will add weight to certain words which might not be as common in newspapers and literatures, but I'm sure the most common ones in this game are good ones to know!
Edited: 2006-11-05, 2:18 am
Reply
#10
ChrisFritz Wrote:Obviously such a ...what's the word I'm looking for... A game focused on a specific genre will add weight to certain words
I recently got myself a PSP for the express purpose of playing games in Japanese to boost my vocabulary... no really, honestly Smile It's been fun - I've got to play a great game (Metal Gear Acid 2) while learning some new vocab (most of which has to do with blowing stuff up, however!).

I have found some words to repeat themselves often in the dialogue/game interface, so I've been very happy with my choice. And I get to play games and call it study Smile
Edited: 2006-11-05, 12:26 pm
Reply
#11
I find it very strange that the authors of that site at the top chose to analyze Macbeth. Macbeth isn't modern English. Why not at least a Darwin novel or something?

Also, this is an absurd claim:
Quote:The English language has the largest vocabulary out of any language with 1,000,000 words by some estimates. Most other languages do not come near that vocabulary, so there's less to learn to get the same "bang" with foreign languages.
Nobody can make the absolute claim that English has the largest vocabulary, and even if it did, the total number of words says nothing about the number necessary to be proficient, since most of that estimated million are words that get used once in a blue moon, and a lot of antiquated and specialized words that most people don't know. I think Japanese could easily require far more words than English in order to attain proficiency.
Reply
#12
JimmySeal Wrote:I think Japanese could easily require far more words than English in order to attain proficiency.
It's been my experience that Japanese often uses the same word in various contexts and sentence patterns, which alter the meaning, whereas English just uses a different word. One example comes to mind is 美味しい (おいしい) i.e. delicious in English. Everything seems to be delicious in Japanese. Do you even hear a Japanese person say that food that tastes good is not 美味しい? What about tasty, delectable, exquisite, mouth-watering, etc. etc. and the other tens of words you could use to describe good taste? My guess is that 美味しい can be modified subtlety by how it is said (美味しい〜) or adding ですよ at the end etc. There are probably other words which describe good taste in Japanese aside from 美味しい, but it does seem you get more "bang for buck" with Japanese words.
Edited: 2006-11-06, 12:10 am
Reply
#13
rgravina, if you watch English television and listen to English conversation around meal times, how often don you hear a meal described as delectable or exquisite?
I think you'd have to go into adult-oriented books of a slightly classical nature to find those words used much.

I think you may be imagining Japanese to be over simplistic based on the common things you hear/encounter in common usage.

Although you can do a lot with japanese words using particles and conjugation there are probably quite a lot of alternate ways to express what you consider to be simple things that you only need 1 word for.
Do you know 美味い「うまい」 ? its 1 okurigana seperate from 美味しい「おいしい」 but is another word to describe tasty!
Reply
#14
Hi ChrisFritz,
sorry for the late reply, but I wanted to say thank you for the excellent post. It's really interesting. I'm a bit puzzled as to how you examined the data, I guess you ran a script on the data that you obtained from the game cartridge?

It would be excellent if we could make a program like that that could be run on other text sources. For example a learner could run it on a short text to obtain the vocabulary, and study ahead the most common words, in order to ease the reading of a Japanese book, and tailor the vocabulary flashcards to the reader's needs.
Reply
#15
rgravina Wrote:It's been my experience that Japanese often uses the same word in various contexts and sentence patterns, which alter the meaning, whereas English just uses a different word. One example comes to mind is 美味しい (おいしい) i.e. delicious in English. Everything seems to be delicious in Japanese.
Actually there are lots of ways of describing tastes in Japanese and it's a bit of an art form.

In English, you might not say that it was tasty or delicious, you'd probably just say that it's good. When my friends came to visit me in Japan, they'd always want to know how to say 'this is good'. We tend to categorise everything as good or bad. I've been told it's a very western thing.

I think it depends on the context too. They say that the Eskimos have lots of different words for snow. In Japanese, there are lots of different words for describing the sound that rain makes. In English (at least in Scotland!), there are lots of ways of describing how cold it is.
Edited: 2006-11-13, 2:22 pm
Reply
#16
Serge Wrote:A somewhat unrelated thought - does anyone know if there is an 'official' list of words that students need to know for various levels of Nouryoku Shiken? It is usually said that Level 2 requires 7,000 words but none of the lists I have seen so far contains as many - including the one that has been posted elsewhere on this forum...
I'm pretty sure that there is one. I saw it at my last Japanese language school. Not sure where you can get hold of it though.
Reply
#17
Wow, now I have a belated reply!

ファブリス Wrote:I'm a bit puzzled as to how you examined the data, I guess you ran a script on the data that you obtained from the game cartridge?
That's more-or-less it. After copying the game to the computer, a script is run on the file. The script used takes the game's text, and replaces all non-kanji characters with spaces. [At least, it replaces most non-kanji characters with spaces. I had to do a bunch of find&replace's by hand.] A quick regexp is used to replace all instances of more than one space with only one space. Afterwards, the remaining data is space-separated kanji. This can be split into an array of kanji's.

The code looks something like this. Let's assume the file "words.txt" contains the space-separated kanji's.

Code:
$string = file_get_contents("words.txt", "r"); // Load into string.
$words = explode(' ', $string); // Split string on spaces, into an array.
$counts = array_count_values($words); // Count how many times a word appears in the array.
arsort($counts); // Reverse-sort array while maintaining indexes.  This means most-common word is first.
From there I scribbled code something like this:
Code:
echo "<table border=1>\n";
echo "<thead>\n";
echo "<tr>\n";
echo "<th>Frequency</th>\n";
echo "<th>Kanji</th>\n";
echo "<th>% Learned</th>\n";
echo "<th># Learned</th>\n";
echo "</tr>\n";
echo "</thead>\n";
echo "<tbody>\n";
$count = 0;
$total = 0;
foreach($counts as $word=>$amount)
{
    $count++;
    $total+=$amount;
    echo "<tr>\n";
    echo "<td>$amount</td>\n";
    echo "<td>$word</td>\n";
    echo "<td>",round($total/count($words)*100, 1),"%</td>\n";
    echo "<td>$count</td>\n";
    echo "</tr>\n";
}
echo "</tbody>\n";
echo "</table>\n";
I'm not sure about the accuracy, but with as many words as there are, I'm sure what I posted here and on my blog was "close enough" to accurate. I'm sure this code could be heavily improve. Feel free to take some whacks at it and see what you come up with.

One possible way to get a string of space-separated kanji (to begin the process with) would probably be to find out the Unicode values for the first and last kanji in Unicode, then strip out every character that isn't within that range. (For example, U+3042 is "あ". The earliest kanji I can find is U+4E00 "一" [table 78 on KCharSelect, and probably Windows' CharMap as well], and the latest I see offhand is "龥" at U+9FA5. I'm sure there's a resource online somewhere that tells for certain the range open for kanji's.) Replacing all characters outside this range with a space each, then stripping those down to one space between each kanji might not be the best way to go about getting the kanji count, but it's the easiest I can think of.

Edit: I'd have a proof-of-concept up, but MB String for PHP is not enabled on the server I use. I need to find out how to get it enabled as I plan to use it!

The source code is here:
http://kurifuri.com/files/remote/koohii.com/words.txt

Anyone, feel free to do anything with the code.
Edited: 2006-12-03, 10:51 pm
Reply