kanji koohii FORUM
For those who finished Core 2k/6k/10k - Printable Version

+- kanji koohii FORUM (http://forum.koohii.com)
+-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html)
+--- Forum: Learning resources (http://forum.koohii.com/forum-9.html)
+--- Thread: For those who finished Core 2k/6k/10k (/thread-11924.html)

Pages: 1 2 3


For those who finished Core 2k/6k/10k - vosmiura - 2014-07-11

jeffberhow Wrote:cophnia61, vosmiura:

Here's a great read I came across when looking up the meaning of word family.
http://www.nflrc.hawaii.edu/rfl/PastIssues/rfl82hirsh.pdf

This also helped me understand vosmiura's stats up there and gave me more confidence in the structural learning of vocab. I hope not to keep going off topic.
Thanks, nice resource. I think some people had mixed up the % of known words with % of known tokens.

The My Girl drama 10,000 words gave us 85.03% coverage of words, and 99+% coverage of tokens, so it should be quite easy to enjoy at that level.


For those who finished Core 2k/6k/10k - jahnke - 2014-07-11

jeffberhow Wrote:I was curious and did an online test, and I only know about 22k words in English as a native speaker: http://testyourvocab.com/result?user=4258365
I did this test some months ago and I got a little bit above 11k words in English. This is more than enough to fluently understand TV series like How I Met Your Mother, The Big Bang Theory, Breaking Bad etc. Also, I can fluently read science fiction and fantasy books like A Song of Ice and Fire, I Fire Upon the Deep and others.

Every time I change the TV or book series I have trouble for some time. Watching a new TV show I need some episodes to get used to the new characters. Reading new books is not really a problem as I can just read slowly at the beginning.

I hope that I will be able to understand some Japanese shows without subtitles with less then 6k words and to read Japanese novels when I reach around 10k. But it is probably not a good estimate because even when I don't know a word it is easy to guess because there a tons of words of Latin origin in English and my native language is Portuguese. With Japanese I don't know if just know some words with the same kanji is enough to guess the meaning of a new word and I am almost sure that I can not guess the pronunciation.


For those who finished Core 2k/6k/10k - jeffberhow - 2014-07-11

jahnke,

What method are you using for your vocabulary building, and what is your average rate? Also, are you using the same method as you used for English?

I think the kanji will help a lot for context clues.


For those who finished Core 2k/6k/10k - jahnke - 2014-07-12

jeffberhow Wrote:What method are you using for your vocabulary building, and what is your average rate? Also, are you using the same method as you used for English?
I'm "done" with Tae Kim, but I can only recognize the grammar. I did this using recognition cards on Anki. Now I'm slowly trying to understand better the grammar reading again the guide and writing at lang-8 to see how far I can go.

I'm using core6k (recognition) from a long time. I was learning 10 words a day, but now I have one month of vacation and I decided to rush a little bit and I'm adding 50 cards a day. I'm around 2k cards.

I'm also trying to read some pages of a manga, NHK Easy News (with Rikaikun) and short stories from ふぁんた時間. I'm very slow and reading is very painful.

For English I didn't had any method to learn. I just played a lot of games, watched tons of TV shows and read a lot. Recently I decided that I have to learn to write properly and I'm writing at Iang-8.

I tried this with Japanese but it didn't work very well. =)


For those who finished Core 2k/6k/10k - cophnia61 - 2014-07-12

Perfectionist paralysis

Quote:They need to know as many words as possible before they dare try to use them with a native. What if the native casually mentions his pet budgerigar and you haven’t learned that word yet?



For those who finished Core 2k/6k/10k - cophnia61 - 2014-07-12

MaxHayden Wrote:
cophnia61 Wrote:EDIT2:

I did it just now for MyGirl drama and the conclusion is that the words which fall under the 10k in VDRJ's list do appear in total 74088 times in that drama, while the remaining words appear 588 times. So if we know only the first 10k words in VDRJ's list, we'll encounter a word we don't know one time every 126 words.
Yeah. This seems much more reasonable. FWIW, for spoken English, 95% coverage (1 word in 20) is 3000 word families. 98% (1 in 50) is 7000. Because we are using lexemes instead of word families, the numbers for Japanese should be a little different. But the point is that you can probably get to a point where you can understand that drama and use it for comprehensible input with fewer than 10k words. (Maybe it would be a useful project to take a whole bunch of subs and generate a table listing the vocabulary levels you need for 95% and 98% coverage so that people can see what shows rank where.)
Man this is a real drama Sad I think I did that calc wrong :/ Now that I've done a script to calculate it, it seems that with the first 15k words you're going to see 1 unknown word every 8,5... Is it possible? Oh no :'( I'm going to kill myself Sad

My Girl JDrama word frequency report

Someone can do a script just to see if I'm wrong? The first column is the number of times that particular word appears in the drama. Could you tell me with the hypotetical knowledge of the first 15000 words (in one of those lists) how many time I'll encounter an unknown word in that drama? Because it's clear my math skills are way worst that I thinked... Sad

I did this:

-for the first 15000 words in SUW list

- find the words in "My Girl word frequency list" that are present in those 15000 words

- for every word that is present in the 15000 word list, sum the times that word appear with the previews words

-for the remaining words in SUW list

- ...do the same thing...

-the overall number of appearances of the words in the 15000 / the overall number of appearances of the remaining words

Please tell me I did it wrong Smile

EDIT:

Only now I realized I can try to read one of those subtitles just to see by myself... I read the first 100 words of the first episode, I searched every word in a couple of those frequency lists and they are all frequent words, in truth I already know most of those words xD
The only word above 15000 was 空席 and appeared two times in the same sentence.
Now... or the first 100 words are extremely easy and after that the drama become difficult with an unfrequent word every 8 words, or I don't know...
Tomorrow I'll continue to read, I'll read all the first episode, and after that I will say how many unfrequent words do appear and how much frequently Tongue


For those who finished Core 2k/6k/10k - vosmiura - 2014-07-12

I changed my script to use the token count here http://pastebin.com/8BHrvt24

Here's what I got:

Using lexeme DB from 'C:\temp\VDRJ_Ver1_1_Research_Top60894.csv'
Matching lexemes in 'C:\temp\My Girl JDrama Morphemes+Freq.txt'
Using 'Standard (Newspaper) Orthography' lexemes
Using 'Word Ranking for General Learners' ranks
Loaded 60894 lexemes
Lexemes matched: 2627 Missing 1293
Tokens matched: 61916 Missing 12760
Coverage of found tokens
Range Match Tot %
1 ~ 5000 : 59380 59380 95.90%
5001 ~ 10000 : 965 60345 97.46%
10001 ~ 15000 : 726 61071 98.64%
15001 ~ 20000 : 215 61286 98.98%
20001 ~ 25000 : 250 61536 99.39%
25001 ~ 30000 : 48 61584 99.46%
30001 ~ 35000 : 28 61612 99.51%
35001 ~ 40000 : 141 61753 99.74%
40001 ~ 45000 : 108 61861 99.91%
45001 ~ 60894 : 55 61916 100.00%


For those who finished Core 2k/6k/10k - cophnia61 - 2014-07-12

vosmiura Wrote:I changed my script to use the token count here http://pastebin.com/8BHrvt24

Here's what I got:

Using lexeme DB from 'C:\temp\VDRJ_Ver1_1_Research_Top60894.csv'
Matching lexemes in 'C:\temp\My Girl JDrama Morphemes+Freq.txt'
Using 'Standard (Newspaper) Orthography' lexemes
Using 'Word Ranking for General Learners' ranks
Loaded 60894 lexemes
Lexemes matched: 2627 Missing 1293
Tokens matched: 61916 Missing 12760
Coverage of found tokens
Range Match Tot %
1 ~ 5000 : 59380 59380 95.90%
5001 ~ 10000 : 965 60345 97.46%
10001 ~ 15000 : 726 61071 98.64%
15001 ~ 20000 : 215 61286 98.98%
20001 ~ 25000 : 250 61536 99.39%
25001 ~ 30000 : 48 61584 99.46%
30001 ~ 35000 : 28 61612 99.51%
35001 ~ 40000 : 141 61753 99.74%
40001 ~ 45000 : 108 61861 99.91%
45001 ~ 60894 : 55 61916 100.00%
Thanks vosmiura!

In the meanwhile I read the first 25 mins of the first episode of My Girl xD I don't know how many words there were, but in those 25 mins the only words above 15k which appear are:

空席
従妹
孫娘

掠り傷 (this is not present in the list, but 掠り and 傷 yes, and they are above the 15k)
脈拍 (absent, but I'm not sure)

So all I can say is a person with 15k vocab can certainly see that episode and in 25 mins it must look at the dictionary only four-five times. But in general a great part of those words are also below the 10k, so I'm re-reassured now! ahahaha


For those who finished Core 2k/6k/10k - vosmiura - 2014-07-12

Btw, if you use Morphman I use it with a slightly different ordering to keep things i+1 and ordered by frequency.

The default Morphman priority is:
#1 Prioritize cards with least unknowns (i+1)
#2 Prioritize cards based on length
#3 Prioritize cards by frequency of the new morpheme

What I've found for my use is this means Morphman will sometimes prioritize many morphemes of infrequent use (e.g. once in the whole deck) over some common ones that might appear 20+ times. I think it's more productive to learn the higher frequency ones first, so I just swapped the last two priorities.

To see what I mean...
change: mmi = 10000*N_k + 1000*lenDiff + freq
to: mmi = 10000*N_k + freq * 10 + lenDiff
...in addons\morph\main.py and run Recalc.