I'm really curious how these decks were made and if it's possible to do it for another language? is there a program out there that can scan ebooks, websites, etc and make these lists? and is there a program that could convert that list into cards?
I would really like to have something like that for korean. I have a 1700 card deck "3400 if you count generated reversed cards" that I hand typed, but I would like something more organized and larger.
erlog
Member
From: Japan
Registered: 2007-01-25
Posts: 518
The thing you need to understand about a lot of this data is that it wasn't really created by any of the people in any of the places you're looking. Breen did some cross-referencing, added that frequency data to EDICT, and then people just dumped the data out of EDICT into flashcards.
I believe the frequency data is based on a study of newspapers that was done somewhere else.
This project is probably going to take you more than 10 hours worth of work. Also, it's really unclear how useful frequency data actually is. It's always a Zipf distribution.
http://en.wikipedia.org/wiki/Zipf%27s_law
Once you get past like the first 1000 words the word frequencies start to be almost equal to one another in terms of the practical application of the number you're getting. So the 4,000th word tends to be only slightly more common than the 15,000th.
Last edited by erlog (2012 July 06, 12:33 am)