cb4960 Wrote:tuliaoth Wrote:Anyway, I'm beginning to see that it may be too difficult to sort a custom vocab list to prioritize the most useful words. Unless one of the generous programmers here is ever attracted to the idea, I'll just try to be more selective and keep going as I have.
I wrote a simple program called cb's Frequency List Sorter that sorts a list of Japanese words based on their frequency. See the second part of this post: http://forum.koohii.com/showthread.php?p...#pid167828
Excellent work! With the data already available, harnessing it like this was bound to happen sooner or later. I copy-pasted your full list into Excel and found that...
The 10k most frequent words have a frequency index of 2,317 or above.
The 20k most frequent words have a frequency index of 949 or above.
The 30k most frequent words have a frequency index of 528 or above.
The 40k most frequent words have a frequency index of 328 or above.
The 50k most frequent words have a frequency index of 220 or above.
(From a total of 238,265 unique words.)
As I see it, an intermediate reader focused on vocab acquisition has no reason not to use the word frequency list generator. Even if 'limited' to 5,000 novels, I assume it's still a fairly accurate indicator of a word's absolute frequency in the language. In fact, I'm surprised this sorting-by-frequency idea wasn't brought up before by post-core learners.
Sorting collected vocab by frequency yields significant benefits:
- You will fill larger vocab gaps before smaller ones, so vocab-wise reading itself becomes easier as quickly as possible, as with texts having recurrent JLPT vocab.
- You can learn less vocab without compromising the overall progress of your reading ability. Simply decide not to learn words whose frequency index is lower than X.
- You have an easier time SRSing, as you're more likely to re-encounter the words you're reviewing, depending on the frequency index average of your word list.
Thanks for this. Good idea, effective implementation and possibly amazing results.