Joined: Nov 2005
Posts: 269
Thanks:
0
has anyone computed what percentage of each of the JLPT levels does the vocab in core 2000 and 6000 cover?
or more practically, would one's vocab be strong enough for 1級 after completing core 6000?
Joined: Jul 2007
Posts: 2,313
Thanks:
22
There are a few words in the course's sample sentences that are not in course's vocabulary list. And like Nest0r says, some words are spelled in Kana where one can use Kanji.
What you can do is use the spreadsheet of Core 2k and 6k's vocabulary (kanji and kana), use spread sheet to merge with JLPT list (kanji and kana), do a formula looking for matches to mark another cell to tag JLPT words not in Core list. It's similar to what I did with Tanuki list and Core list, tagging words to remove blatant duplicates (about 2600 from the 7100 Tanuki words). Just remember to have columns with original sorting number and whether it's Core or Tanuki list. Also, when you sort, sort by kana then by kanji columns.
Edited: 2009-08-06, 6:36 am
Joined: Jun 2008
Posts: 160
Thanks:
0
I have completed the core 6000 course and have added anki facts for any words included in the JLPT2 list but missing from core 6000. That comes to about 390 words. So, IMO, core 6000 would be adequate for JLPT2 but not JLPT1.
Joined: Apr 2009
Posts: 723
Thanks:
0
Wow 4704 unique words? I've read here before that Coscom says the KO book contains something on the order of ~3600 unique words. I guess the smart.fm sentences are much more word dense?
Joined: Sep 2008
Posts: 1,674
Thanks:
1
Thanks heaps. Those figures are very interesting and that's a really handy comparison to have.
So after completing KO2001 you would effectively be able to read 50% of anything and after completing Core6000 you would effectively be able to read 66% of anything.
Although, I heard that KO2001 covers 80% of written material? Maybe not in terms of totality but frequency?
Joined: Nov 2005
Posts: 269
Thanks:
0
i switched the jlpt data to the "JLPT Vocabulary" shared anki deck. it seems better, plus i get each level separate. here are the new figures:
core 2000+6000:
sentences: 6000
unique words: 6435
word repeats: 13700
cumulative:
1級: 65.9% (5431/8243)
2級: 82.6% (3781/4580)
3級: 94.5% (1092/1156)
4級: 97.1% (572/589)
exclusive:
1級 only: 45.0% (1650/3663)
2級 only: 78.5% (2689/3424)
3級 only: 91.7% (520/567)
4級 only: 97.1% (572/589)
KO2001 smart.fm:
sentences: 3437
unique words: 4704
word repeats: 8519
cumulative:
1級: 49.8% (4106/8243)
2級: 62.5% (2863/4580)
3級: 82.5% (954/1156)
4級: 88.6% (522/589)
exclusive:
1級 only: 33.9% (1243/3663)
2級 only: 55.8% (1909/3424)
3級 only: 76.2% (432/567)
4級 only: 88.6% (522/589)
Edited: 2009-08-09, 10:53 pm
Joined: May 2009
Posts: 31
Thanks:
0
This seems to confirm my suspicion that the majority of JLPT1 specific words are currently not on any list anywhere. (not counting the pure vocab lists)
Joined: Jun 2008
Posts: 160
Thanks:
0
The only "list" with JLPT1 coverage would be the Kanji in context sentences. I believe they cover a majority of the vocab required for the JLPT. Also, books like the UNICOM 1kyuu vocab book have a lot of sentences which cover some amount of the required vocab.
Joined: Aug 2006
Posts: 1,022
Thanks:
1
radical_tyro, do you think you could put your script in an Anki plugin? Or if you share it I can have a go at it. It'd be nice to have a word count estimate in Anki.
I also want to get statistics for Reibun de Manabu which I've been going through. I can also check what I have from KiK (bit more than half of workbook 1).
Edited: 2009-08-12, 3:40 pm
Joined: Aug 2006
Posts: 1,022
Thanks:
1
Thanks.
I ran it on the 1500 sentences I have from Reibun de Manabu so far. This is a cherry picked selection of the first 3/4 of the book, excluding things I thought were too easy, so the actual counts would be higher.
unique words: 3271
word repeats: 6401
1級 only: 17.8% (652/3663)
2級 only: 54.6% (1868/3424)
3級 only: 71.6% (406/567)
4級 only: 79.5% (468/589)
cumulative:
1級: 41.2% (3394/8243)
2級: 59.9% (2742/4580)
3級: 75.6% (874/1156)
4級: 79.5% (468/589)
And this is from the first 44 chapters (~850 sentences) of KiC:
unique words: 2596
word repeats: 3542
1級 only: 17.2% (631/3663)
2級 only: 31.2% (1068/3424)
3級 only: 56.3% (319/567)
4級 only: 71.5% (421/589)
cumulative:
1級: 29.6% (2439/8243)
2級: 39.5% (1808/4580)
3級: 64.0% (740/1156)
4級: 71.5% (421/589)
Edited: 2009-08-13, 10:35 am
Joined: Jul 2007
Posts: 2,313
Thanks:
22
Well, since you are doing this, how does the Tanuki list look? At 7000 entries covering the entire jouyou, it's bound to cover a lot of JLPT material.
Joined: Aug 2006
Posts: 1,022
Thanks:
1
Tanuki list:
unique words: 8682
word repeats: 20801
1級 only: 42.8% (1568/3663)
2級 only: 57.9% (1983/3424)
3級 only: 78.3% (444/567)
4級 only: 85.1% (501/589)
cumulative:
1級: 54.5% (4496/8243)
2級: 63.9% (2928/4580)
3級: 81.7% (945/1156)
4級: 85.1% (501/589)
Hmm, not a great match either. It seems that Tanuki has a lot of words not on those JLPT lists. At the same time, it doesn't kanjify all the words, which is part of why they don't match up.
Edited: 2009-08-13, 10:36 am
Joined: Jun 2006
Posts: 736
Thanks:
0
If you run the lists through mecab first, all of those issues should go away. I'll be able to report when the plugin is finished (soon).
Joined: Sep 2008
Posts: 1,674
Thanks:
1
Can you run the whole of KIC through this?
Joined: Nov 2005
Posts: 269
Thanks:
0
has KIC been transcribed?