pmnox Wrote:1) The code is written in Python. However, it's not written for readability. I'm not sure if it will be any use to you, but I can share a dropbox link to all files that I used to generate this deck.If you could do that at some point, I'd appreciate it. (If you didn't make your own sort script, I also need a copy of the one you used since the original one isn't available for download anymore.) Whatever you have is better than starting from scratch; worst case it will give me a few ideas.
Quote:2) Cool. How is your progress on learning Japanese? Are you an advanced learner or beginner learner? I heard that a lot of people stop using lists after learning 6k-15k words and then they just read native material.I'm a beginner in terms of serious study of Japanese, but it'll be the third language I've learned as an adult. I've watched Japanese language anime for years and knew the kana, but my vocabulary is small and I didn't start learning kanji until recently. B/c of how I approach projects like this, I learned a good bit about second language acquisition in the past and I'm trying to apply that knowledge to knowledge to improve the resources available for Japanese in advance of when I'll need to use them.
Quote:3) I added all features that I wanted. I would probably have removed the production cards if I were to do this deck again. I don't find them useful anymore. I'm not even sure if studying more than 10k cards out of context is useful. When I see new word I simply enable the card from 10k-25k cards list, and I add it to list of all cards I learn.They generally say that learning up to 98% coverage with flashcards is very helpful (at least for adult learners). So at least in theory, it should be worth the effort to learn the first 20k words from this list with flashcards. The next 11k might or might not be depending on what you want to do, but it's probably better to do what you are already doing with them.
Quote:4) I used K02k1, because they ordered kanjis in logical way. Kanjis like 0, 1, 2, ...,9 would appear together. Also kanjis that have similar meanings like left, right, etc. I didn't want to use RtK order, because I learned all kanjis before I started studying this list. I still think that trying to learn both new kanjis at the same time as well as reading is counter productive. At the very list my attempt to learn that way a few years ago failed.When you say "in a logical way", do you mean it orders them in a way that helps you learn readings (like RtK2) or just in a way that the kanjis are that are related in terms of meaning appear near one another?
I wasn't as clear as I could have been above. My thought was to make a "fixed" RtK2 in the following way. The kanji in RtK2 are either assigned to a signal primitive group or they are not. So if I ascribe the total frequency of a group's members to that group (and just look at the individual frequencies of the remaining kanji), I can then sort by frequency and put the highest frequency group or individual kanji first, followed by the next highest frequency one, etc. Within each group I could put all the regular kanji in frequency order followed by the exceptions in frequency order. So that would give a mostly frequency-based sort order for ~3000 kanji that grouped things by on-yomi readings as much as possible.
Ideally, I'd improve the sort script to look at the furigana and figure out what reading each kanji had for each vocab word so that I could sort in a way that would group the on- and kun- readings together separately and would put the ones with multiple readings in the right place in the sort order given the reading. But I'm not sure how feasible this is yet. (Conceptually, it seems like it wouldn't be that involved, but maybe I'm wrong...)
So, how do you think this stacks up against the KO sort order? My concern is that I may end up replacing one kind of interference (too many semantically related words occurring close together) for another kind (too many similarly sounding words occurring close together). But I'm thinking that this may not be as big of a problem b/c of the kanji.
Quote:5) Do want to have RtK mnemonics for all kanjis in each word? Or are you trying to do something different?Well there are several types of mnemonics you can use to learn vocabulary itself. For some words you can decompose the word into parts (kanji in this case) and come up with a visualization that links the parts together to form the meaning of the compound. For words where that doesn't work, you can come up with a keyword in English that sounds similar to the vocabulary word and then use a visualization that links the two. There are some other more exotic techniques as well. (The one used in the kun- chapter of RtK2 is a good example.)
I'd like to have a set of mnemonics and mnemonic aids for all the vocabulary that isn't transparently related to the meaning of the underlying kanji. But like I said, I don't think this kind of thing exists. (If the Michel Thomas people ever make a "vocabulary builder" for Japanese, it will have a lot of these.)
Edit to update status:
I'm in the process of comparing the different frequency lists that we have available (BCCWJ, VDRJ, and the community created one that's floating around) to try and make a master list that we can generate a new set of decks with. My current plan is to start with VDRJ's three different lists and then to incorporate any high frequency words from the other lists that aren't in the VDRJ lists. I'm also going to generate a "supplemental vocabulary" list by taking the words that aren't in the main list and then identifying the highest frequency remaining ones in each sub-category. (My experiments show that around 750 words covers between 1/3 and 1/2 of the unknown words in a given field. So these words are worth learning if you want to read materials included in that specific area.) I think this is more promising then just trying to incorporate the next 10k or so words from the general list. (Since those 10k only add about 1% vocabulary coverage, but for a given area, you can get similar results with less than 1000 words.)
I'm also working on figuring out what kind of frequency coverage the Core decks actually give and where their words fall in terms of major frequency lists. (So what percent of core 2k/6k/10k is in the 2k/6k/10k most frequent? What % of the most frequent words are in different levels of Core? What about the CorePlus stuff? Etc.)
MaxHayden Wrote:(FWIW, I also contacted the publishers of the two printed graded reader series to see if they had a vocab list so that I could add tags for people who wanted to use them, but if they don't have the lists pre-made, I'm not going to include that information unless someone else goes through the books and makes the list for me.)The people at NPO多言語多読 (Japanese Extensive Reading) have a list of vocabulary. However, they only make it available to authors, but not to instructors or even to the publishing companies. (So ASK uses the same vocabulary list, but doesn't have a copy of what the authors used.) They do this because they want to discourage people from deviating from the goals of extensive reading. Consequently, they politely requested that we *not* go through their books and compile a vocabulary list from the contents.
I'm not really sure how I feel about this because the second language acquisition literature says that just *having* the list to draw your attention to the new words will improve learning. (And so most English-language graded readers include a list.) Similarly, being able to activate Anki cards based on what you just read would probably be helpful. That said, manually compiling the list from the books would take more work than I'm willing to put in. If some has a suggestion for how to automatically generate a vocab list from the physical books, I'm open to trying it. (The readers are all very short, so even though there are a lot of them, scanning them wouldn't take that long.) Otherwise, I'm not going to do it, but if someone else is willing, I'm open to discussing whether or not we should honor their request.
Edited: 2014-07-09, 1:52 pm
