![]() |
|
Amount of vocabulary - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: The Japanese language (http://forum.koohii.com/forum-10.html) +--- Thread: Amount of vocabulary (/thread-10309.html) |
Amount of vocabulary - Irixmark - 2012-12-27 vileru Wrote:To change the topic, a Japanese linguist once told me that amount of vocabulary needed to reach newspaper/academic/novel literacy in Japanese is much higher than English. He explained that, whereas the vocabulary used in spoken and written English correspond for the most part, the huge gap between written and spoken Japanese stands as an obstacle for learners. My experience confirms this. How about everyone else?Very much my experience as well. Japanese newspapers don't seem to put bounds on the words they can use. In that sense Japanese is a bit like Spanish, you can be pretty much fluent in spoken Spanish and still be unable to read a more sophisticated novel without a dictionary. I often don't have a sense if a word is something uncommon but potentially usable in conversation, or really confined to rarefied disquisition (ha, there you go, two GRE words in one sentence!). So while the entry barrier to reading newspapers or novels is higher, fortunately you only need to be able to read these words, unless you aspire to write in Japanese yourself. Academic writing in the social sciences is a lot simpler in my experience because so many of the terms are literal translations from English or some other European language. It's easy to understand 量的金融緩和政策 in context if you know the concept in English and the four separate compounds. Amount of vocabulary - Aspiring - 2014-01-21 This site applies the typical science used at koohii. http://www.vocabulary.com/howitworks/ *** I'd often use anki and a dictionary, but I found this to be a better alternative. The quick dictionary and schedule feature make the site comparable to real-time import (rikaichan). Which, by the way, are wonderful addons. The site also has a parser that selects unique words. To remain relevant to the category, "The Japanese Language", the following link has adumbrated language learning topics such as: "Study Finds Listening Essential for Learning New Words" "distributed practice" and "practice testing" "New brain research suggests that forgetting is not a sign of weakness" "The Forgetting Curve" http://www.vocabulary.com/articles/under-the-hood/ Amount of vocabulary - dizmox - 2014-01-22 Even if you know enough words to "get by", it's incredibly frustrating to not be able to express oneself eloquently when the time arises. If one's writing for an audience (even just people on a message board or blog/twitter), it's not necessarily what you say but how you say it, so having a wide arsenal of active vocabulary is a great thing to have. We're not Japanese children, so we don't have 18+ years to learn by osmosis. Without Anki it would have been very difficult for me to have rapidly built up that repetoire. Amount of vocabulary - ktcgx - 2014-01-22 vileru Wrote:Ok, so I know this comment is more than a year old, but I missed "adumbrate", and just looked it up, and I'm really kicking myself because I should have been able to tell what that word means, given it's clearly "ad" + "umbra". Duh.Fillanzea Wrote:I got 34,900. (Native speaker of English, published writer.) I think knowing words like "adumbrate" and "uxoricide" is really a test of whether you read a lot, especially of literature and essays written in the 18th-19th centuries."Adumbrate" is in many GRE vocab lists and, although "uxorcide" is not, "uxorious" is (which makes it easy to infer the meaning). Amount of vocabulary - nadiatims - 2014-01-22 Fluency isn't really about word count so much as having a supreme amount of audio exposure to and practice of the basic vocabulary. Little kids are well truly fluent and can even read pretty well long before they ever have anything resembling an adult vocabulary. Having said that, growing your on paper vocabulary is one of the best ways to unlock that kind of fluency creating audio exposure. Without it you'll understand next to nothing of anything interesting. So unless you find yourself some kind of L2 parent that's happy to find you comprehensible input, you won't get very far at all. Audio unlocks listening comprehension and leads to fluency. But a lot of reading unlocks audio by giving you the vocabulary you need to get anything out of it. so vocabulary is very important but there's not really any reason to keep count. As long as you're learning new words you should be improving your reading and listening comprehension. Amount of vocabulary - Inny Jan - 2014-01-22 nadiatims Wrote:Fluency isn't really about word count so much as having a supreme amount of audio exposure to and practice of the basic vocabulary.Are we talking "fluency" again? ![]() How about stop using a word that everyone seems to understand differently and filling in that table below: Code: Skill Volume SpeedAmount of vocabulary - Stansfield123 - 2014-01-22 vileru Wrote:This is true only for common words. Once you start acquiring rare words that only show up once per 1,000,000 words, then that's when flash cards are useful. In fact, I use flash cards for rare English vocabulary.A couple of points: 1. Words refer to concepts. Flash cards only help you learn words that are attached to concepts you already know. For instance, synonyms to more common words, or, in the case of a second language, words that translate to words you already know in your first language. But, past a certain point, different languages no longer refer to the same concepts, they come with their own concepts (their own way of categorizing things). You can't learn that from flash cards. If you want to learn words that refer to new concepts, you need to actually read materials that explain and use those concepts. Even with the synonyms, flash cards won't help you learn the reason for the existence of most synonyms: nuance. So a flash card will rarely if ever explore the full meaning of a word. With many words, it takes entire essays to explore their full meaning. 2. Not all reading materials are created equal. I assure you, if you spend the time it would take to cram over a hundred words using flash cards on reading James Joyce's Ulysses instead (with the help of a dictionary), you'll probably rack up quite a few words each day. And the person showing you how to use each word is a master of the English language: he will teach you more about each word than a flash card ever could. 3. This is just a different way of stating the above, but: a dictionary, even the biggest, best dictionary, is a very, very, very, ..., very modest attempt to describe a language. Flash cards (that actually let you learn 100+ of them a day) are a brief excerpt from each word's dictionary entry. Conclusion: learning tens of thousands of words with flash cards will never make you the master of a language. Flash cards are useful for beginners: they help when the words to be learned refer to basic concepts, and reading is not an option. But to master a language, you must learn from the masters of that language. Mastering a language isn't about memorizing words, it's about exploring them. As far as memorizing, sure, you do have to memorize them. But that requires no special attention. You memorize a word as you explore it. Amount of vocabulary - nadiatims - 2014-01-22 Think of the definition on a flash card or in a dictionary as just a keyword that quickly gives an overall sense of what a word means. It's just something to use as a memory hook and provide a hint (even if somewhat vague) when you're stuck on what something means. Of course you can only learn nuances from tons of exposure. However I don't know how anyone approaches something native materials without either taking a huge amount of time or by using some kind of mass vocabulary strategy. Ie. using flash cards, word lists, bilingual materials etc. Amount of vocabulary - Aspiring - 2014-01-22 @Stansfield123 When I read your post I immediately thought back to a reply made by Vileru, on 2013 August 25, 7:52 am. Both arguments are interesting. Dictionaries provide a word's meaning, derivation, variations, and pronunciation. Dictionaries vary in content. The types of dictionaries range from beginner to intermediate to advanced. Children's dictionaries often seem like picture books. Intermediate dictionaries are very approachable and easy to understand. Collegiate dictionaries are brief and are best used for neutral terms. Amount of vocabulary - egoplant - 2014-01-22 To reply to myself of 1 year ago, my opinion now is that you need 20k+ vocabulary to fluently read most things. I think the people saying 6k or 10k are just guessing and not actually speaking from experience, although I still only know 12k (in Anki at least) so I'm doing the same thing with my 20k estimate. All I know is with 12k I still look up words constantly. Amount of vocabulary - afterglowefx - 2014-01-22 I'd really question any sort of online vocabulary test. I had a go just for fun and wound up just a touch over 30k words. Which is fine, I don't really know what it means, but it's fine. But then I checked out their statistics page and found that it put me at middle-of-the-road average in vocab. Which is funny, because I recently took the GRE and wound up 99th percentile. And for those of you who haven't taken the GRE, the only people who take the test are college graduates intent on pursuing MAs and PhDs--generally speaking (exceptions surely abound!), bright people. It's not open to whoever wanders in from the murky depths of the interwebz. I still don't know what 30,000 words really means, but I do know that anything claiming to give you an accurate assessment of anything in 5 minutes with a handful of questions is only in it for the ad revenue. Amount of vocabulary - ktcgx - 2014-01-22 afterglowefx Wrote:I'd really question any sort of online vocabulary test. I had a go just for fun and wound up just a touch over 30k words. Which is fine, I don't really know what it means, but it's fine. But then I checked out their statistics page and found that it put me at middle-of-the-road average in vocab. Which is funny, because I recently took the GRE and wound up 99th percentile. And for those of you who haven't taken the GRE, the only people who take the test are college graduates intent on pursuing MAs and PhDs--generally speaking (exceptions surely abound!), bright people. It's not open to whoever wanders in from the murky depths of the interwebz.I'm a bit confused by what exactly they mean. I think it's certainly possible that they mean that you know at least one definition for 30,000 words, which might include words with more than one definition, even though it only "tests" you on one. For example the previously mentioned "running". It's not clear if this test would count those 3 seperate words as 3 words of your 30,000, or 1. Amount of vocabulary - afterglowefx - 2014-01-22 ktcgx Wrote:I'm a bit confused by what exactly they mean. I think it's certainly possible that they mean that you know at least one definition for 30,000 words, which might include words with more than one definition, even though it only "tests" you on one. For example the previously mentioned "running". It's not clear if this test would count those 3 seperate words as 3 words of your 30,000, or 1.I've precious little faith in something claiming to give me a number as precise as 30,140 when it took me all of five minutes to check a few boxes. On top of that, there's the problem you just mentioned and has been highlighted throughout this entire thread. Further, and most importantly, knowing a word does not just consist in knowing a definition of the word, but in understanding how it relates, connects to, and corresponds with the rest of the language. This all of course has implications for our learning a second language, as Japanese clearly does not share the same intricate pattern of relations as English does. While one could sit down with Core/Anki for a year and claim to know a definition of 15,000 words, they'll in no sense be fluent in the language. Putting the words together and seeing how they relate is where the magic happens. Knowing a word's definition is a necessary condition for knowing said word, but it's not a sufficient one. Amount of vocabulary - ktcgx - 2014-01-22 afterglowefx Wrote:Oh yes, don't get me wrong, I highly doubt that test is in any way accurate, it's basically a bit of fun online, lol. I'd say they've probably gone for a 90-95% confidence interval, but clearly, without knowing exactly what they've done to arrive at that figure, I don't trust it, haha. I'd like to think I know more than the 35k words it said I didktcgx Wrote:I'm a bit confused by what exactly they mean. I think it's certainly possible that they mean that you know at least one definition for 30,000 words, which might include words with more than one definition, even though it only "tests" you on one. For example the previously mentioned "running". It's not clear if this test would count those 3 seperate words as 3 words of your 30,000, or 1.I've precious little faith in something claiming to give me a number as precise as 30,140 when it took me all of five minutes to check a few boxes. On top of that, there's the problem you just mentioned and has been highlighted throughout this entire thread. Further, and most importantly, knowing a word does not just consist in knowing a definition of the word, but in understanding how it relates, connects to, and corresponds with the rest of the language.
Amount of vocabulary - Stansfield123 - 2014-01-22 Aspiring Wrote:@Stansfield123I agree with what Vileru is saying there. I rarely "guess" the meaning of a term. I don't use an actual dictionary, but I use google to always read up on a word or idiom, rather than mis-use or mis-understand it. However, that doesn't mean he's right about the flash cards or dictionary entries. Sometimes a dictionary entry is enough, but often there's a need for a lengthy explanation, like a wikipedia page, a forum thread or an encyclopedia entry (in the case of terms used in philosophy for instance), or even more research, integration and gradual understanding. Amount of vocabulary - Stansfield123 - 2014-01-22 afterglowefx Wrote:I've precious little faith in something claiming to give me a number as precise as 30,140 when it took me all of five minutes to check a few boxes. On top of that, there's the problem you just mentioned and has been highlighted throughout this entire thread.They're not claiming to give you an exact number. If you go to the page describing the method, the very first sentence clears that up. They give you the result of the method they used. That's all they're promising, and that's what you're getting. The result is likely off by thousands. And it's not black magic, they applied general principles from Statistics to a massive source of materials (the British National Corpus) to devise a method that's more exact than just picking a random sample of words. Statistics can be used very effectively to increase precision without increasing sample sizes. Obviously, a bigger sample of words would be nice, but I think their main concern was to get as many people as possible to take the test, rather than give the most exact individual result. They seem more interested in gathering stats about groups of people, than providing a test as a public service. The test itself is a carrot they're dangling to get all the bunnyrabbits to participate. Amount of vocabulary - afterglowefx - 2014-01-22 Stansfield123 Wrote:... The test itself is a carrot they're dangling to get all the bunnyrabbits to participate.Marshaling statistics and a clever analysis of a large corpus of English is all fine and dandy, but I still question the worth of the end result. I don't see how ticking ~100 odd boxes is able to output something as mind-bogglingly complicated as a person's vocabulary, which, again, consists in far more than collections of definitions (and in reading this thread it seems we are quite agreed on that point). Problems with the method aside, there's no quality control whatsoever. If I'm considered to be middle-of-the-road in literacy, either everything I've experienced in my many years of university schooling (including the results of one of the most widely-accepted standardized measures of English proficiency in the world) is off by a mile, or about half the people taking our little test have been playing "click all the boxeses!!" in stark defiance of the purpose of the test. Amount of vocabulary - WataruFord - 2014-01-24 egoplant Wrote:Can somebody give me a rough idea about vocabulary numbers? It seems like I can't get a concrete answer (maybe because there isn't one). For example, how many vocabulary for shounen manga, seinen manga, visual novels, light novels, novels and how much does the average kid, teenager, adult, educated adult have? I know it depends on what the specific title you're reading, or the specific person, but I tried to divide it into categories. Please try and give actual numbers, thank.The 3000 most common words in novels will allow you to understand about 84% of novels. These were the results of one person's word frequency analysis: The first 100 words on the list make up 57.2% of the text that was processed. The first 500? 70.3%. The first 1000? 76.2% The first 3000? 85.4% The first 10,000? 94.1% http://www.offbeatband.com/2010/12/the-most-commonly-used-japanese-words-by-frequency/ Amount of vocabulary - s0apgun - 2014-01-24 84% of words in novels*** knowing words doesn't mean comprehension Amount of vocabulary - dizmox - 2014-01-24 afterglowefx Wrote:It's not that mind boggling or complicated at all. It's perfectly reasonable in fact. Take for example the following situation.Stansfield123 Wrote:... The test itself is a carrot they're dangling to get all the bunnyrabbits to participate.Marshaling statistics and a clever analysis of a large corpus of English is all fine and dandy, but I still question the worth of the end result. I don't see how ticking ~100 odd boxes is able to output something as mind-bogglingly complicated as a person's vocabulary, which, again, consists in far more than collections of definitions (and in reading this thread it seems we are quite agreed on that point). Suppose you have a bag of 100,000 balls representing words. Say X% are red and 100-X% are blue. The red balls represent words you know, and blue the ones you don't. If you pick balls at random out of this bag you will very quickly be able to estimate the proportion of red versus blue. The number of balls in the bag isn't even relevant here, only your sample size. It's not hard to show that with a sample size of 100 you'll generally be accurate to within a few percent or so. In reality its slightly more complicated, but not too much. Amount of vocabulary - lauri_ranta - 2014-01-25 WataruFord Wrote:The first 100 words on the list make up 57.2% of the text that was processed.Numbers like that depend a lot on the length of the word frequency list and what kind of words are included on the word frequency list. The distribution of most word frequency lists is approximately linear on a log-log scale for a discrete distribution, or approximately linear on a lin-log scale for a cumulative distribution. See https://en.wikipedia.org/wiki/Zipf's_law. Here are the discrete probability distributions for three Japanese word frequency lists, where the x-axis is the frequency rank: ![]() And here are the cumulative distributions: ![]() The top 10,000 words account for about 87% of all occurrences for the longest list (with about 300,000 words), about 96% for the second longest list (with about 60,000 words), and about 98% for the shortest list (with 15,000 words). The longest list (word_freq_report_jparser.txt) includes many proper nouns but the other two don't. If a word frequency list follows the classic version of Zipf's law, for a list with 50,000 words, the top 10,000 words should account for about H_{10000}/H_{50000} = (1+...+1/10000)/(1+...+1/50000) ≈ 86% of all occurrences. The percentages are higher for the lists above because for example the Leeds word frequency list is cut off after 15,000 words, two of the lists don't include proper nouns, and analyzers like MeCab don't recognize all words. Amount of vocabulary - Haych - 2014-01-25 lauri_ranta Wrote:http://19a5b0.s3-website-us-west-2.amazonaws.com/freqcumulative.pngThat's a good graph, and it would seem to agree with the 10,000 number that people are often quoting, since that's about where the common sources lose the linear trend. It's crazy to think how going from 10-100 has just as much benefit as going from 1,000 to 10,000. Then after that, you are getting sub-linear results on an exponential scale, which is pretty terrible. The literary one seems to indicate you should know more, but at that point, I think the program probably wouldn't be doing a good job of distinguishing between very similar items, so I'm a bit skeptical. It would take more than that to make me advocate for 50,000 words. Amount of vocabulary - WataruFord - 2014-01-27 lauri_ranta Wrote:The blog says the algorithm compiled over 65 million words. Beyond that I don't know how it was done.WataruFord Wrote:The first 100 words on the list make up 57.2% of the text that was processed.Numbers like that depend a lot on the length of the word frequency list and what kind of words are included on the word frequency list. Amount of vocabulary - tokyostyle - 2014-01-27 The NTT vocab test that was referenced in the off-topic English thread: http://www.kecl.ntt.co.jp/icl/lirg/resources/goitokusei/goi-test.html Amount of vocabulary - Vempele - 2014-01-27 tokyostyle Wrote:The NTT vocab test that was referenced in the off-topic English thread:Broken. Check the last word: 33900. Check the first and the last word: 800. |