![]() |
|
Core 6001 to 12000 - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: Learning resources (http://forum.koohii.com/forum-9.html) +--- Thread: Core 6001 to 12000 (/thread-6782.html) |
Core 6001 to 12000 - BlackMarsh - 2010-11-28 About a month ago I created an Anki deck from the Core 6000 vocab list and audio files. I realised that I learn best when I hear a word, rather than reading it, so the top card of my anki deck is the audio of a single vocab, and the bottom card is the audio of the example sentence, as well as the text of the sentence and vocab. Never have I been able to recall vocab so well. This method is working so well, in fact, that I think if I had started it years ago my Japanese level would be so much higher. Anyway, my deck includes all the words/sentences from the Core 6000 list (although I deleted all the JLPT 3 and 4 vocab because I don't need them). However, 6000 items of vocab isn't enough for me. I'd like to have a bank of about 12,000. Therefore I'm thinking of creating my own database of all the remaining "common" (e.g. ones that appear in JLPT 1) vocab in the same way as the Core list, including example sentences, and also including audio. My plan is to pay a bunch of Japanese students to sit down and record these words and sentences which I can then use to create my new deck. Or instead of paying them I could just give them a few hours of free English lessons or something. But before I go ahead with this project I'd like to make sure that there isn't a deck already out there that would cover this range of vocab. Also, if anyone wants to help me out in creating the deck (i.e. knowing which words to add) and adding useful example sentences to be later recorded then let me know and I'll hook something up with Google Docs. Core 6001 to 12000 - Bokusenou - 2010-11-28 This sounds like a great idea! Have you heard of RhinoSpike.com? Core 6001 to 12000 - Blahah - 2010-11-28 The original source of the Core 6000 vocab actually includes 10,000 words. It was compiled by Jack Halpern's CJK institute from analysis of newspapers (I think). The whole list plus audio and sentences with sentence audio is in the iPhone app J Sensei. I'm working on extracting it, which would provide an extra 4000 vocab. Since the data isn't publically available, I think I'd only give it to people who had paid for the iPhone app. I like the sound of that deck design, I might experiment with it for a while. Core 6001 to 12000 - Nukemarine - 2010-11-28 Here's another option (one I might do). This is from a scan of about 50 million words or so. Here's the top 15,000 words (removed single characters and spaces). https://spreadsheets.google.com/ccc?key=0AscWM0WNU3s4cHNLaUJLWDE1d20wSWVPTTNIalNoQWc&hl=en#gid=0 With some spreadsheet magic, you can remove duplicates already in Core 2k/6k. Group the remaining words in sets of 4000 each. Then use Cangy's program to sort these groups. I'd recommend spacing the kana only words evenly throughout the list after that, just keep their original order. Core 6001 to 12000 - rachels - 2010-11-28 BlackMarsh Wrote:However, 6000 items of vocab isn't enough for me. I'd like to have a bank of about 12,000.The corePLUS deck - available at the anki download site - has the core words and sentences, plus additional vocabulary (> 20,000 in total) taken from the the words listed as common by Jim Breen's Edict. However, it is currently undergoing a major tidy-up / re-write and when reposted, hopefully before the end of the year, will have, in addition: word list (tags) for Genki 1&2, IATIJ and Tobira by chapter A more complete set of tags for JLPT 1 - 5 A field listing homophones and their definitions A field listing if the word is 'usually Kana' as stated by Edict. Tags for a large part of the vocabularly from KO Tags numbering RTK2 words the other stuff from the kore spreadsheets For most of the extra words (non-kore), sound and example sentences are available through plugins, but not sentence audio. Thanks to Cangy for the franki program - still taking time to process the data though. Last edited by rachels (8:42 am) Core 6001 to 12000 - BlackMarsh - 2010-11-28 Thanks for the encouraging replies everyone. So it seems like there is already a solid base to work from. I'm sure with collaborative effort we can end up with something very useful for a lot of people - kind of like a wiki for a vocabulary and example sentence database. Example sentences can come from ALC or Jim Breem or wherever and I'm sure we can recruit a bunch of native speakers to do the recordings somehow then set up a team to cut and edit the audio files as well as assess the quality of the recording. One thing that is slightly bothersome with the current Kore is that there is only one example sentence for one sense of each word. Whereas we know that many words have many subtle senses (eg つく, 合う). It would be good to have multiple example sentences for different uses of a word along with explanations of how the meaning fits with the core meaning of the word - eg there are many phrases that use the verb 当てる but there is still a core sense of the word 当てる contained within each variance. An explanation will help solidify the nuances of such vague words. Core 6001 to 12000 - BlackMarsh - 2010-11-28 Bah, sorry if that last post was confusing. It's hard to write a long post on an iPhone. Core 6001 to 12000 - gyuujuice - 2010-11-28 This is quite the project, count me in! ...Well after the JLPT. (n_n' )> I can work on the images and proofreading if you like. Now if this branches out from the same design as the old Core 2000 and 6000 then will the grammar still be pretty basic or will we use more complex grammar? Well I found it helpful that I didn't have to study both grammar and vocabulary at once so maybe just limit to N2 grammar unless the vocabulary word is a grammar point like "関する". Perhaps there could be two separate sections like so: Core 2000 Core 6000 Core 1000 Core 1000> Oh, and how many items would be in each step? 200? 250? 500? Just my two cents. My account is below. (http://smart.fm/users/gyuunyuu) Core 6001 to 12000 - BlackMarsh - 2010-11-28 Good point gyuujuice. I like the fact that the Core sentences are basic as you don't have to worry about complex grammatical constructions. Although as you said many words are considered grammatical elements at the same time so it would be wise to include them. Actually once you get past learning particles and politeness levels most Japanese grammar is just words used in clever ways. Eg 伴う is taught as vocab in Core but is mostly used as a grammatical function. Core 6001 to 12000 - Katsuo - 2010-11-28 Another useful word/expression + example sentence database is made by ALC "Daily Expressions". They are freely available on ALC's site (click 日常表現集), but I compiled them into a spreadsheet and Anki deck(8MB) for convenience. Features as follows: • 6,120 words/expressions each with an example sentence • Each includes an extra "bonus" sentence (so there are 12,240 total) • Each main sentence is given two possible English translation sentences • Explanation in Japanese Notes: • Compared to the Core 2000 & 6000 series, the ALC expressions tend to be more colloquial & idiomatic. • This material is aimed at Japanese people learning English, so the explanation is in Japanese, but it's usually simple enough for English speakers to follow. • An English translation of the word/expression is not stated separately; it is usually given as part of the explanation sentence. It would be useful for the English equivalent to be in its own "box", so I went through some of them adding this feature, but only did a few hundred. • These would be great with audio! Core 6001 to 12000 - gyuujuice - 2010-11-28 "Eg 伴う is taught as vocab in Core but is mostly used as a grammatical function." Yeah, so maybe a hybrid of both? I think by 6001 words you should understand N2 grammar (and on to the next level N1) so maybe just not N1 grammar. My goal for next year is to double my vocab so this would be the perfect tool to do so. Core 6001 to 12000 - BlackMarsh - 2010-11-29 Katsuo, I checked out that spreadsheet and am thoroughly impressed. Although I recognise almost all the words as individual words themselves, the way they are combined as idiomatic phrases makes them almost like new words in a way. Now I'm wondering what would be more useful - learning 6000 new words which undoubtedly become rarer and more specialised (some you might come across only in very specific fields), or having a strong bank of idiomatic phrases that come from words already learned. Core 6001 to 12000 - Thora - 2010-11-29 BlackMarsh Wrote:I realised that I learn best when I hear a word, rather than reading it, ... Never have I been able to recall vocab so well. This method is working so well, in fact, that I think if I had started it years ago my Japanese level would be so much higher.I recently came across a study aimed at determining whether beginner students acquire vocab better using romaji instead of kana. They concluded no, but what they ended up discovering was that audio was the only technique that resulted in improved vocab retention by every group. So there you go! :-) I think you're right, though, that an audio sentence for every word, including infrequent words that you'll likely only encounter in reading, probably isn't necessary. At some point, you want to be encountering words in material you enjoy. Large premade vocab lists might not have the right stuff. Although I can see it being useful for JLPT preparation. (would you need to add about 2000 to Core6?) At least for personal use, consider selectively adding audio or sentences where it makes sense or you feel you need extra reinforcement. Thurd's audio plugin is great for that. I personally aim for collocations, common expressions, or related words rather than individual words and/or sentences. Some words don't need sentences, and some words ought to be learned as part of a pairing or expression. It may not look pretty in a spreadsheet, but I find it to be more practical. Core 6001 to 12000 - Nukemarine - 2010-11-29 It's been suggested by me before, but here it is again: 1. Find your set of words such as top 16,000 words in blogs sorted by frequency. 2. Break these up into groups of 2,000 each 3. Sort each group by Cangy (I like KO2k1's order). 4. Distribute the Kana only words in each batch evenly throughout each group (every fifth word for example) in their frequency order. When you're done, you have a list of words that ANYONE can intuitively realize is useful. Each set by nature has deminishing returns on value, but still is more valuable than the following set. In addition, the sets are arranged in an easier to learn order (anecdotal evidence only). From that list, it's only a matter of spreadsheet editing to add additional information such as that mentioned by Rachel above. However, those that get 3k+ vocabulary probably stop using word lists. Guys like Jarvik are just using Rikaichan to copy new words they come across to add in later to Anki. Like everything else, what's essential for a beginner (useful, organized structure for grammar and vocabulary) is more optional for intermediate and advanced. Core 6001 to 12000 - deign - 2010-11-29 BlackMarsh that's a good idea. According to my own experience, to memorize a new vocab word the best situation is to learn it inside of simple sentence with AUDIO ONLY, which is exactly such as in Core. I use a "sentence listening" model which is usually very easy and then once mastered I change the model to "sentence kanji" which shows only the kanjis. Then I change it again to "word kanji" which is the hardest level. The i+1 audio sentence helps a lot because you can rely on the other words to recognize the new word. After the first encounter, the audio remains in your head. I have not a lot of time but I'm available if you need some help or ideas to improve Core. Core 6001 to 12000 - BlackMarsh - 2010-11-29 Nukemarine, that's a good idea to find vocab but I don't think it'd be a good idea to use the sentences from the blogs etc as they would likely be too long, complicated and context dependent to act as individual sentences (not sure if tuis is what you were suggesting). Actually I think that would be brilliant for creating topic specific databases. For example, if you were interested in Japanese mythology and wanted to read about it in Japanese but didn't have the vocab for it you could first put loads and loads of text through whatever the program is, come up with a list of most frequent words, edit it accordingly, learn the vocab for the subject, then dive into original material. Core 6001 to 12000 - BlackMarsh - 2010-11-29 Ok, I'm beginning to like Katsuo's list more and more and I'd like to go ahead and get the recordings going. I had a good idea for recruiting native speakers too: in return for Japanese recordings we do our own English recordings for the native Japanese speakers! Total win-win. Also, although Katsuo said he had created single word English definitions for some of the words (I couldn't see any in the spreadsheet at all though?) we will need to do this for all of them. It shouldn't take too long though. Finally, as for finding native speakers to do the recordings, we can just put the word out on Lang8 or somewhere. Volunteers would be given a list of individual words and sentences to record, they then email one of us the audio file and we cut it up into individual parts and put the data into the spreadsheet. Core 6001 to 12000 - gyuujuice - 2010-11-29 "Distribute the Kana only words in each batch evenly throughout each group (every fifth word for example) in their frequency order." --Good idea here I always thought that WRITING helped me remember better -- I am kinesthetic but the audio also helped. (The typing is a big help to me.) I allready have 3 or so friends who would be willing to do recordings. But I would need some contact information.
Core 6001 to 12000 - rachels - 2010-11-29 Nukemarine Wrote:From that list, it's only a matter of spreadsheet editing to add additional information such as that mentioned by Rachel aboveQuestion for Nukemarine - while I admit I've been using spreadsheet manipulation eg to create the data-file for the homophones field, I have generally prefered to do much of my editing, importing, merging, manipulating with regular expression etc from within anki itself. I've almost got all my datafiles sorted and tidied but am delayed by the slowness of running overwite fields/franki on such a large 20,000+ anki file. So what would you consider might be the best way to replicate the functionality of franki using spreadsheet manipulation. eg large anki file (or spreadsheet) with a unique key of the form kanji{kana} and a smaller subset of the data with tags or new info to be merged in ? Maybe import dummy data from the large file into the smaller so that every key is in both files and you can sort and match the rows exactly. But is there a cleverer way? For the second task where the key is a non-unique kana field, it seems more trouble than it's worth ? I felt like I ought to tidy the deck up and, for my own use, the homophones data and textbook tags will be very useful, but when I have reposted the deck, I don't intend to do anything more to it. In particular, if Mukemarine or BlackMarsh or anyone wants to reuse the data or use the deck as a basis to add extra example sentences and audio, please feel free. From my own experience using the deck, listening to a japanese word, then on the answer side, listening to an example sentence is working well, but there often are times when rushinig through and not listening to the sentence is all I can be bothered with, and more audio example sentence per word would probably slow things down too much and result in less words being covered - probably not a good trade-off. Anyway, the tatoeba or example sentence plugins are always available for when you want to look at more varied examples. More audio sentences would be really cool, but ? a big investment of time making it (a friendly warning) but I'm sure a lot of people would use it... |