Back

Word Frequency and Anki

#1
Hi everyone,

I have been playing around with the idea of learning the most frequent Japanese words first so as to give me the most chance of understanding a given sentence. I found an interesting principle that with only 20% of the most frequent words in a language you can understand 80% on average of said language.

I have obtained a list of the most frequent words in Japanese from here: http://www.offbeatband.com/wp-content/up...1-3000.txt

Apparently these were obtained from a corpus containing 65 million words and from learning all of these you should be able to read 85% of the entire corpus, which I found very interesting.

Now of course nothing is perfect and many of the "words" are one letter hiragana particles or weird stuff, but ignoring these I plan on using this to create vocab cards for Anki.

Now onto my question:

Does a method or plugin exist for anki to output the most frequent words in a deck? I watch j-dramas and use subs2srs as a way of getting sentences and I would like to learn the most frequent words in a j-drama first in order to maximize my understanding.

I know the best way to keep learning fun and fresh is to learn words you find interesting rather than the most frequent, but I have no problem mindless studying words so I think this could be a beneficial method of learning.

If anyone can help me with this or if you just want to share your thoughts on learning by frequency, please feel free to comment.

Thanks.
Reply
#2
I'm not sure how to do that with Anki, but I thought Core 2K consists of the most frequent 2000 words, Core 6K the next 4000, and Core 10K for the next 4000. Since you say you have no problem with mindless word-studying, it looks like you could start with Core 2K.
Reply
#3
Or.. because the most common words come up the most in native material you can go through material until you have a set percentage of words from the japanese language. 20% which would statistically consist of the most common words. I'm very analytical I can tell you at a rate of 41 card per day I'll be done with core by January, 29 by Feb. 23 by march. etc. but that won't make them come to me any quicker. the benefit of list is that it helps you see your accomplishment more clearly. the benefit of 'wild' japanese is it shows you can actually use the language. I plan to finish core of course but slowly as i just pick up words from the internet or books.

Edit: Just thought of something though if you add a field to the card and number them you can sort the cards by that order and unsuspend them as you want to do them. this way you will be going from most common to least common. theres a lot of programs in the RevTK developer showcase that will sort words for you for different types of media.
Edited: 2012-10-21, 11:09 am
Reply
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions! - Sign up here
JapanesePod101
#4
As Core2k6k is based upon the top 10k words from 10 years worth of newspapers, you should be ok studying from that. On top of using most common words, it's split into most smaller groups of 1-400, 401-2000, 2001-4000 and 4001-6000 groups with those groups organized by most commonly used kanji grouped by similar meanings.

Thanks to Kanji having inherent meanings in the words that used them (with exception of ataji IIRC), you tend to get words grouped by similar meanings useful in triggering memorization. Not 100% mind you, but useful.

Now, after intermediate level you'll find it's useful to get word frequency lists for specialized areas. From what I've heard, getting about 500 specialized words will get you functional in those areas (law, medical, math, engineering, etc.). I'm hesistant to call that a hard rule with regards to Japanese as it likely applies more to European languages.
Reply
#5
I see thank you everyone for your thoughts. I think I will just give Core6k a go then. Do you have any recommendations on the particular version of Core I should use? I have heard there are a couple of different ways of sorting core to make it easier, but I don't know enough about it to make a judgement. Since i+1 is generally the most efficient method of learning and you say Core has on average 1.66 "new" words per sentence is there a way of ordering them that manages this efficiently? I seem to remember trying Core a long time ago and struggling after hitting a few sentences with 3 or so new words and thus losing steam and giving up.
I was going to try using the JLUP method of J-J only anki cards since I probably know a good 1500-2000 words, do you think this will require a lot of work to do for Core? I'm under the impression that 10,000 words will bring one basically to fluency, is this true with the words from core?

Sorry about all the questions, I just want to make sure as much as possible that I am not using my time inefficiently before I commit to something as massively time consuming as learning 6,000 sentences.

Thank you very much everyone.
Reply
#6
Nukemarine's deck has a reasonable ordering (in fact, there are several you can pick from). There is still the problem of example sentences that use words that are still deep inside the deck. You can do one of these:
1 - ignore the word and only test the highlighted word
2 - learn all unknown words and fail the card if you get any wrong
3 - find other cards that use those extra unknown words and study them (e.g,. unsuspend them)

I do 3 and it's working out very well for me since you get batches of sentences that use vocab you are still learning closely together. Now that I think about it, perhaps extracting the order that I first answered each card and create a sorting index based on that could be helpful to others. (There's an Optimized-Sent-Index field but it doesn't match what I'm doing. I don't recall what that was for.)

Kewickviper Wrote:Since i+1 is generally the most efficient method of learning and you say Core has on average 1.66 "new" words per sentence is there a way of ordering them that manages this efficiently? I seem to remember trying Core a long time ago and struggling after hitting a few sentences with 3 or so new words and thus losing steam and giving up.
Some people, when confronted with two new words, think "I know, I'll learn both words". Now they know two words.
Edited: 2012-10-24, 5:49 am
Reply
#7
Kewickviper Wrote:I'm under the impression that 10,000 words will bring one basically to fluency, is this true with the words from core?

Thank you very much everyone.
My advice is to learn the 10k which will give you a fantastic foundation, and then focus on adding vocabularly specific to what you enjoy reading/listening to etc.

10k will never be enough to understand everything, but it's a very good start nonetheless. My ANKI decks are over 13,000 and I'm still adding stuff daily : /
Edited: 2012-10-24, 6:33 am
Reply
#8
I see I have just downloaded nuke marines deck thank you! I think I'm just going to battle on with the sentences that have more than one "new" word. Some of the sentences in Tae Kim have 3-4 new long words and although it makes it super, super difficult they've gone in eventually.

Is it necessary to do all the suspending and unsuspending stuff with the Core2k/6k? I haven't really been doing things in the order he describes as I've basically finished Tae Kim and am about 1000ish into RTK, but have only just started Kore.

Thank you for the insight on fluency level after core. By "basically to fluency" I just meant able to understand general conversation and topics and read books, watch anime and j-dramas comfortably etc... I understand that there's always more to learn! Even in English I learn probably 3-4 new words a week. This week I learned "accrue" and have been using it in emails around the office all week haha.

I'm a very impatient person and really want to smash out core6k asap by doing 100 a day or something silly, but I've burnt out from doing this in the past haha so this time I'm being patient and finishing RTK first.
Reply
#9
There should be an index that sorts the sentences by the kanji in them. It's not optimal, as you get a lot of sentences which all have the same new "common" word. I argue it's best not to learn the sentences or care about the new words in them as sooner or later you will learn them. Use the i+1 and grouped benefit of the vocabulary word order instead.

That said, I did learn it it my first time through by learning all words in the sentence. It's doable.
Reply