![]() |
|
Counting the words you know? - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: Learning resources (http://forum.koohii.com/forum-9.html) +--- Thread: Counting the words you know? (/thread-3727.html) |
Counting the words you know? - yukamina - 2009-08-09 There's a point when you can't keep track of how many words you know. At least, for people who don't catalog and SRS everything. I don't know whether I know 7000 words or 10 000 or more. It doesn't really matter, but I'd like to know. Especially so I can go over the words I should know but have forgotten. I thought I could use Trinity and enter all the words I want to keep track of, but I realize that I can't start a new account :/ Is there another program that will keep track of how many unique words you add to it? Counting the words you know? - mentat_kgs - 2009-08-09 Uhm, an anki plugin would be neat. Counting the words you know? - mafried - 2009-08-10 I can tell you how you'd go about writing one. But if someone's already gone and written it, that'd be news to me. Counting the words you know? - yukamina - 2009-08-10 Well, I can't write computer programs. I'll find some other alternative. Anki will prevent you from adding a card that's the same as another right? I guess I'll just use that. Counting the words you know? - bandwidthjunkie - 2009-08-10 I've thought about this; if you were learning a language that (discounting contractions like I've, there're etc..) used exactly one character to separate words (ie " "), such as English, or any other language that uses the Roman alphabet then it is fairly stright forward to count the words you have learned quite accurately since you need no underlying knowledge of the language to count the distinct words in your SRS sentences. Although this doesn't quite capture all the information because it is arguable that knowing "press" and "conference" doesn't automatically mean you know "press conference," so if "press conference appeared, then perhaps it should be treated as a word in it's own right, I suppose this is more personal opinion. However, determining word boundries in Japanese strikes me as a much more involved problem and one that would almost certainly require a knowledge of grammar and a comprehensive dictionary to approximate accurately. Counting the words you know? - Tobberoth - 2009-08-10 bandwidthjunkie Wrote:I've thought about this; if you were learning a language that (discounting contractions like I've, there're etc..) used exactly one character to separate words (ie " "), such as English, or any other language that uses the Roman alphabet then it is fairly stright forward to count the words you have learned quite accurately since you need no underlying knowledge of the language to count the distinct words in your SRS sentences. Although this doesn't quite capture all the information because it is arguable that knowing "press" and "conference" doesn't automatically mean you know "press conference," so if "press conference appeared, then perhaps it should be treated as a word in it's own right, I suppose this is more personal opinion. However, determining word boundries in Japanese strikes me as a much more involved problem and one that would almost certainly require a knowledge of grammar and a comprehensive dictionary to approximate accurately.Let's not forget all the words you know which aren't in your SRS. Counting the words you know? - bandwidthjunkie - 2009-08-10 Tobberoth Wrote:Let's not forget all the words you know which aren't in your SRS.How could possibly have learned anything that isn't in your SRS lol
Counting the words you know? - radical_tyro - 2009-08-10 bandwidthjunkie Wrote:I've thought about this; if you were learning a language that (discounting contractions like I've, there're etc..) used exactly one character to separate words (ie " "), such as English, or any other language that uses the Roman alphabet then it is fairly stright forward to count the words you have learned quite accurately since you need no underlying knowledge of the language to count the distinct words in your SRS sentences. Although this doesn't quite capture all the information because it is arguable that knowing "press" and "conference" doesn't automatically mean you know "press conference," so if "press conference appeared, then perhaps it should be treated as a word in it's own right, I suppose this is more personal opinion. However, determining word boundries in Japanese strikes me as a much more involved problem and one that would almost certainly require a knowledge of grammar and a comprehensive dictionary to approximate accurately.fortunately, strong people have done the hard work. see for example mecab or kakasi. Counting the words you know? - mafried - 2009-08-10 Actually I've thought about it some more, and there might be an even easier way. I'll look into writing a plugin tonight or tomorrow. I know this is a "just google it" sort of question, but I'm very short on time: does anyone have a machine readable list of vocab for the JLPT levels? (preferably the new ones, but both would be ideal). If you're interested in the plugin, this would help it along. Counting the words you know? - Tobberoth - 2009-08-10 mafried Wrote:Actually I've thought about it some more, and there might be an even easier way. I'll look into writing a plugin tonight or tomorrow.http://www.thbz.org/kanjimots/jlpt.php3 Counting the words you know? - mafried - 2009-08-10 Perfect. Thanks, Tobberoth. Counting the words you know? - radical_tyro - 2009-08-11 mafried, i think the jlpt vocabulary shared anki deck is better. it has more words, and it doesn't include levels 3 and 4 in the level 2 file. also, it doesn't list a bunch of words in kana in level 4. what are you trying to do? i can send you my python code used http://forum.koohii.com/showthread.php?pid=65607#pid65607 if you want. Counting the words you know? - mafried - 2009-08-11 The same thing you already did, it seems. I wasn't aware of your efforts. Thanks for the link. I am/was in the process of writing a plugin that would run the "Expression" or "Kanji" fields of a Japanese deck through mecab to extract a conjugation-free word list, then print statistical information gleaned from that (not unlike what you have posted). The JLPT data would have been for comparison's sake. I was also planning on showing a weighted percent coverage of a frequency list as well. How is your python script setup? Counting the words you know? - Codexus - 2009-08-11 What I did to count the words in my SRS was export the content and then use the chasen parser to break down the sentences and then it was just a matter of counting the unique words in the result. I wonder if it's a good thing to know how many words you know. It's easy to become afflicted by the "are we there yet?" syndrome: I only know 5000 words!? but I need at least 20'000 it's going to take me forever. *discouragement ensues* Counting the words you know? - nac_est - 2009-08-11 Tobberoth Wrote:Let's not forget all the words you know which aren't in your SRS.That's very true. Every time I look at the list of Jouyou Kanji that I have not yet entered into my sentence deck (it's a feature of anki), I can come up with one or more words that I already know for each of them. Osmosis is amazing. Counting the words you know? - blackmacros - 2009-08-11 This plugin would be super useful for me right now. It would be very convenient to know exactly how many words I know so that I can adjust (and hopefully reduce...) the number of words I need to learn for the JLPT in December. Counting the words you know? - radical_tyro - 2009-08-11 mafried Wrote:The same thing you already did, it seems. I wasn't aware of your efforts. Thanks for the link.here you go. i put some comments in it for you. hope it's useful. frequency list info would be cool. looking forward to seeing your improvements. http://dl.getdropbox.com/u/1144428/word%20analysis.zip |