Is there possible somehow to see most Frequencies katakana words?
EDIT: If I search with kana or kanji, (P) dosen't work.
EDIT: If I search with kana or kanji, (P) dosen't work.
Edited: 2011-02-21, 11:07 am
Yufina Wrote:Is there possible somehow to see most Frequencies katakana words?To see the most frequent katakana words, check "RegEx" and then copy-paste the following regular expression into the search box:
Yufina Wrote:EDIT: If I search with kana or kanji, (P) dosen't work.You're right! Thanks for the bug report! JLPT doesn't work either.
Yufina Wrote:Thank you!
Some JLPT entries are wrong. Here you can find most updated JLPT vocabulary for N5 and N4: http://www.jlptstudy.com/N5/
http://www.jlptstudy.com/N5/ Wrote:As of 2010, there is no official vocabulary list. This list, then, is an approximate guide that is likely to match requirements for the Level N5 exam.I don't suppose somebody knows of a more accurate/official listing of JLPT vocabulary?
nest0r Wrote:The Text Glossing section here: http://www.csse.monash.edu.au/~jwb/wwwjd...ticle.htmlDidn't read the entire article yet, but doesn't overture(?) have a tool that does that?
It looks like the tools for a standalone version are already available in various forms and thus it could be doable to add this functionality to offline programs. The various forms being: jReadability (for character sequence scanning), Rikaisan or something similar (deinflection tables), and offline dictionaries.
cb4960 Wrote:Interesting. A purely offline version is definitely possible. As you state, rikaichan already does inflection, so that's free code. Using cbJisho's frequency list, it might even be better than the rules in the article. And of course, the dictionary files are easily downloadable and free. You could parse just what you need or even entire files and create something similar to those reading-real-japanese books where the Japanese is on one side and the gloss is on the other. It could even be integrated with rikaisan as a "Gloss Mode" that highlights entire sentences and glosses them up.Yess! ... *steeples fingers*
nest0r Wrote:... perhaps it could be compared against a user database... perhaps Anki information (seen cards, cards of a certain maturity, whatever)...- http://forum.koohii.com/showthread.php?p...1#pid93331
cb4960 Wrote:... You can have a scenario where a beginner wants to know which of say 5000 novels would be a good one read with regard to that i+1 thingy...- http://forum.koohii.com/showthread.php?p...9#pid91729
Netbrian Wrote:I absolutely love the word frequency analyzer, and am using it extensively to generate flashcards for any particular corpus I might want to read later.I'm that your finding it helpful. Unfortunately, it's not currently good tool to analyze conjugation frequency. Maybe I'll add support for this in the future (no promises though).
Is it possible to use the utility to figure out which adjective or verb conjugations are most common in a given set of documents, to help focus other parts of my study?
Thank you for all your wonderful work!
anritsi Wrote:Can you add an option to append an entry to a certain file, instead of overwriting everything?Sure, that should be simple enough. I can probably get to it in 2-3 weeks.
*cough* I know, I'm lazy. But this software already does everything else I want it to do, lol...
Zarxrax Wrote:This looks interesting, I'll give it a try later.Makes me wonder if an alternate 'Core' list could be compiled and checked against the Core 6000/10000. I remember discussions on iknow about the last few items from Core 6000 being dated terms. Here is the comment from one of the older users of iknow:
Is it possible to browse words by frequency, rather than searching? (basically to obtain a list of vocab to learn)
Rainer Konowski Wrote:As far as I know, the Core items have been licensed from the CJK institute (which is probably best known for "The Kanji Learner's Dictionary"). There are 10000 items in total, which can be found in the "Japanese Sensei" iphone app. There is no further information on the pages of the CJK institute, so you can only guess where it comes from.Interestingly, that 日ソ is ranked the 35,000th most common word by Rikaisama...
I think that neither selection nor order of words make sense for any kind of source. For the newspaper hypothesis: One of the surprising omissions is the word "日本" (Japan), which you could easily find in any newspaper. In contrast, the Core items contain the weird word "日ソ" (Japanese-Soviet, this was later edited out by Cerego). This points back to the 80s - maybe the Core set was compiled without the help of computers, which could explain some of its shortcomings.
Some notable omissions that I remember right now: Almost all expressions, interjections, greetings (did you know that はい means yes?). Everyday words like コンビニ (kombini, convenience store), 上履き (uwabaki, indoor shoes/slippers). Place names like Tokyo, America, Mount Fuji. Japanese-culture specific things like okonomiyaki (pancake-like food), 剣道 (kendou, hit others with a wooden stick). Very common words like 小さな/大きな (small/big). Language learners also would want to see grammar patterns like "~や~など" (and so on) or "しか~ない" (only), nothing like that is in the lists. Another problem is that many example sentences are unnecessarily abstract and do not show a typical use of the word in question. I personally would prefer an order of items first by topic and secondary by frequency. That automatically gives some context and makes learning easier.
I agree with Russ that you should ignore the prescribed order and just pick the words that make sense to you and ignore the others. "The final step of the Core 6000" (if that is meant literally) actually contains many useful words. The rare words are spread all over the place.
If I was Cerego I would simply add missing words. They have a mixed Japanese-American team, so what's the problem? All it takes is some common sense and a microphone. Competitor japanesepod does that all the time.
danieldesu Wrote:I haven't tried your tool yet, but I was thinking about things that might be useful in a tool like this. My thought was for each word, you could analyze the words surrounding it to determine if those words appear with higher relative frequency than in the rest of the document. This would indicate that certain words tend to appear together.What you describe is a standard computational text analysis concept know as a "collocation". I learn Japanese based on collocations as a matter of routine.
louischa Wrote:@Christopher: Thanks for cbjisho, I was looking for a tool like that since a long time.cbJisho isn't really the right tool for this task. However, a while back I wrote a program for just this purpose. I'll see about polishing it up a bit and releasing it in a new thread. Here is the readme that I wrote to myself at the time:
However, I can't make it do what I want - I am writing this to ask whether it is because I do not know the proper way, or because your tool cannot do that.
I am trying to generate lists of say, the N (max) most frequent words including a given character, for each character in say, Heisig 1 - X, not using any character outside that list (namely from Heisig X+1 to 3000 or so), and if possible excluding any duplicates. For instance, "一番" is found in the query for "一", but is also found in that for "番". It want it to appear only once in the results.
First, I can't coax cbjisho in saving only N words - for "一", for instance (Heisig 1), I can change the options text file to display only N but when I press ctrl-D (or ctrl-S), it saves the whole 1900 or so results, not just the first N (is that a behavior that you intended?). Obviously, I can select with the mouse the first N definitions from the screen output, but then it becomes too labor-intensive.
Second, is there a SQL query I can use to batch query Heisig 1 - X, excluding X + 1 to 3000 or so?
御苦労様!
Quote:The generated list is meant for RTK learners (non-lite). The list includes every character in RTK (3007 total). The list is sorted by the Heisig number of the kanji.And here is the file that was generated:
Up to 5 entries are associated with each kanji. Each entry contains 4 fields: word, reading, word frequency, and definition.
If possible, the word field will consist of only kana and characters that have already been studied by the learner. For example, 径 is Heisig number 882. The words given for this character will be limited to characters in Heisig numbers 1-882.
Words that are more frequent will be used over words that are less frequent. No duplicate words are used. All fields are separated by tabs.
louischa Wrote:AntConc is another tool that one can use for collocations. Here is nest0r's explanation on its usage:danieldesu Wrote:I haven't tried your tool yet, but I was thinking about things that might be useful in a tool like this. My thought was for each word, you could analyze the words surrounding it to determine if those words appear with higher relative frequency than in the rest of the document. This would indicate that certain words tend to appear together.What you describe is a standard computational text analysis concept know as a "collocation". I learn Japanese based on collocations as a matter of routine.
My workhorse is ALC (http://www.alc.co.jp/). You query any word/kanji, say "挙" and then you eyeball the results. Then, boom, on page 2, you notice clusters such as "挙動", "挙動不審". These are what I memorize. I've been doing that since about 1 year.
Collocations are **the** way to go for learning verbs, as typical verb objects (and the relevant particle) appear in ALC. Japanese verbs are involved in all sorts of idiotisms that are unpredictable/surprising to English speakers.
There is a book that I highly recommend: "Common Japanese Collocations: A Learner's Guide to Frequent Word Pairings", by Kakuko Shoji, but I tend to use it less than ALC since I dislike having to type the book expressions, and the book is much too short.
Obviously, a software tool that would do that job would be extremely useful. That said, I think that some degree of human intervention is needed to separate the grain from the chaff once some frequent pairings are found.
Relevant corpora are important when looking for collocations. For some strange reason, ALC uses many Sherlock Holmes novels (Japanese translation), and these usually provide the weirdest results. However, it is spot on most of the time.