Using Kanji to optimally sort vocabulary lists of non-kanji languages

Index » General discussion

  • 1
 
Reply #1 - 2012 April 10, 9:10 am
Nukemarine Member
From: 神奈川 Registered: 2007-07-15 Posts: 2347

I've been wondering about this, and perhaps it's already been done and tested: Has anyone sorted a foreign language that doesn't use kanji such as Korean or French using simple Japanese or Chinese definitions of those words?

Assume: You have a top 10,000+ list of common words derived form a program that tabulates commonality from a 10,000 book/websites/blogs/news post list of data.

From these words, you use have a google translate or even more dependable list of simple one or two word definitions/equivalent words in Chinese or Japanese with emphasis on words that have kanji

Use Cangy's sorting program combined with the KO2k1 optimized order to sort the foreign language list

Group the common word list into 500 for the first group, another 1500 for the second group (2000 total), then after that into groups of 2000. This gives you a list of words with diminishing returns (most used learned up front), however you have words that are grouped into common meanings thanks to their Japanese or Chinese kanji inferring similar concepts. The idea then being you learn these words in a group more intuitively.

If anyone's done this, is there a link to the post about it? Now, I realize there are resources for most foreign languages out there. The problem with such resources, it's hard to get things from a book into an Anki friendly spreadsheet. This method seems like a method that can work for your average do-it-yourself structured learning type person not interested in mining a college textbook glossary.

Last edited by Nukemarine (2012 April 10, 9:13 am)

Reply #2 - 2012 April 11, 7:59 am
cangy Member
From: 平安京 Registered: 2006-12-13 Posts: 372 Website

nice idea!  sorting by kanji probably wouldn't be that great though, as you are just introducing kanji that can be used, so it'd be ok right at the start but less useful later on.  some kind of automatic clustering would probably be better, or use a semantic reference such as wordnet

Reply #3 - 2012 April 11, 9:39 pm
Nukemarine Member
From: 神奈川 Registered: 2007-07-15 Posts: 2347

cangy wrote:

nice idea!  sorting by kanji probably wouldn't be that great though, as you are just introducing kanji that can be used, so it'd be ok right at the start but less useful later on.  some kind of automatic clustering would probably be better, or use a semantic reference such as wordnet

Huh, that's a really neat idea. I was thinking grouping was something that had to be done manually. It makes sense that Google and other search engines have grouping data. Is cluster data available for free from Google or other data sites? Any idea on best way to sort clusters?

Word frequency lists should be much easier to generate. In my opinion, digital books are best though given the proliferation of blogs those can be just as useful. I mean useful in that going off only news articles offers an increased frequency of political and business terms.

Advertising (register and sign in to hide this)
JapanesePod101 Sponsor
 
cangy Member
From: 平安京 Registered: 2006-12-13 Posts: 372 Website

this should be applicable to kanji words http://en.wikipedia.org/wiki/Cluster_analysis

Last edited by cangy (2012 April 12, 12:30 am)

  • 1