So I spoke to someone at the University of Tokyo. He says that word families don't work as well for Japanese as they do for English and that he thinks we should just learn from a larger lexeme/lemma list instead.
He has recently compiled a frequency list from a large corpus. And he has a character frequency list as well. He suggests also looking at the frequency lists generated from the Balanced Contemporary Corpus of Written Japanese which is organized along slightly different lines.
Since these lists exclude proper nouns, grammatical particles, and the like. I'm going to rerun my frequency coverage stats against them when I have time. I'm also going to try to identify any words that are in Core10k that aren't in the most frequent parts of those lists and vice-versa.
He has recently compiled a frequency list from a large corpus. And he has a character frequency list as well. He suggests also looking at the frequency lists generated from the Balanced Contemporary Corpus of Written Japanese which is organized along slightly different lines.
Since these lists exclude proper nouns, grammatical particles, and the like. I'm going to rerun my frequency coverage stats against them when I have time. I'm also going to try to identify any words that are in Core10k that aren't in the most frequent parts of those lists and vice-versa.
