![]() |
|
Core 10k - optimized i+1 version - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: Learning resources (http://forum.koohii.com/forum-9.html) +--- Thread: Core 10k - optimized i+1 version (/thread-11095.html) |
Core 10k - optimized i+1 version - sunehiro - 2014-04-11 I also removed audio vocab at first, but then when I reintroduced it I found it very helpful. Now I'm doing Core 10k (the last 4k), but I'm a bit worried about the quality of the selection of words. They seem pretty... unuseful and unfashionable compared to the first 6k. Core 10k - optimized i+1 version - Northern_Lord - 2014-04-12 To those of you who are doing this deck, I have a question. I tried this deck, but from the very beginning I was having huge problems because the deck introduced lots of similar-to-identical-meaning words right after each other. When I was introduced to 何しろ、なんとなく、なんとか、どうにか、なんだか (may not have been exactly those words, I don't quite remember) right after each other, I had severe problems keeping them distinct. Aren't anyone else experiencing this? I also imagine words like 資産 and 財産 or 防止 and 予防 appearing really close to each other. Me on my part, I quickly left this deck for one I think was a lot better for me personally. Core 10k - optimized i+1 version - supermancampus - 2014-05-12 Hello All, Quick question. As I learn words "in the wild" is it possible to, un-suspend those words in a way that doesn't break the method the deck is using to help you learn? I'm adding words to a second daily deck, only to find the same word is already part of the 10k when I get to it. Thanks. Core 10k - optimized i+1 version - Dustin_Calgary - 2014-05-12 supermancampus Wrote:Hello All,Well you're either going to get the benefit of i+1, or unsuspend them as you find it. Unsuspending it as you find it in the wild may well introduce other facts you may want to learn at the same time, which goes against the point of the deck, but there is no reason why you can't do that, even if you're still progressing through the deck as well. Core 10k - optimized i+1 version - supermancampus - 2014-05-12 Good point. Thanks for the advice. Core 10k - optimized i+1 version - egoplant - 2014-05-12 I'm really confused about this deck: https://ankiweb.net/shared/info/1132075078 Some words don't even have any definitions for them. Some seem really unused. Core 10k - optimized i+1 version - Thora - 2014-05-12 The description in your link indicates that it is a deck made up of 15,000 words from the so-called Coreplus deck (all the P-labeled words in Edict: more than 20k) which are NOT included in the core10k list. The deck creator suggests unsuspending words as you encounter them in your reading. In other words, it is not a Core 10K deck (the topic of this thread) If by "unused" you mean "rare", well they are apparently common enough to be categorized as Popular in Edict but didn't make it onto the Core10K frequency list. Core 10k - optimized i+1 version - egoplant - 2014-05-12 That deck is actually linked in the OP. I'm just wondering why there are some words that don't even have definitions for them. Core 10k - optimized i+1 version - Thora - 2014-05-12 I see. Sorry. I assumed you had found what you thought was a Core10K deck and were wondering why the words seemed relatively uncommon. I now realize that the OP created both decks, so you're asking here. (Coincidentally, I think the word "not" may have been left out in the OP: "Here is the supplement, an additional set of 15 cards that are [not] in core 10k taken from Tanuki-Ultima deck and CorePlus deck." Also, the Tanuki stuff was apparently later removed.) Core 10k - optimized i+1 version - egoplant - 2014-05-12 I already have around 12k words in a deck, but I was just looking for a word list to import and unsuspend some if I didn't have enough words for the day from reading. I'm just wondering if this is a good list to just add random words from, but maybe it's not supposed to be used like that. If anyone could suggest me word list that is similar I would appreciate it. Core 10k - optimized i+1 version - egoplant - 2014-05-15 By the way there is an error in the supplement deck that effects 98 cards. Anything ending with the English definition of "p)" is changed to "PPP" for some reason. http://i.imgur.com/sfRqUlm.png http://i.imgur.com/OJZ5YM6.png http://i.imgur.com/JCenKC8.png Core 10k - optimized i+1 version - john_sabater - 2014-05-24 Can I have a list of non RTK 1 kanjis on this Core 10 deck. I would like to study them beforehand in kanji koohii. Thank you Core 10k - optimized i+1 version - qwertyytrewq - 2014-05-28 Wow, I'm making faster progress than I thought. As I mentioned in another thread: http://forum.koohii.com/showthread.php?pid=207083#pid207083 qwertyytrewq Wrote:I do have a complaint: I'm using Ankidroid and it says that I have 1000+ new cards left so I have nearly no idea how much I have left. My milestone will be when that 1000+ turns into 999.The problem with this Core10k deck is that each vocab consists of two cards: recognition and production. Based on my experiences, deleting one (EG production) also deletes the other at the same time. I dunno why. So I simply disable the production deck. The problem is that Anki still says that each vocab has two cards (prod and reco) even though I disabled one (production). Which means I have a total of about 7500 cards when there should be about 3700 (the ones after Core6k). As mentioned above, Ankidroid brings in a new problem: when telling me how many new cards there are, it says I have 1000+ new cards (in reality Ankidroid thinks I have "7500" new cards rather than 3700). I started Core10k late March and 2 months have passed and for the entire time, I've been Anki-ing this deck blind (not knowing how many new cards are left). Well, today I was wondering how f******* many there are left, I don't want to be kept in the dark anymore. So I moved my Core10k deck to desktop Anki to check. Turns out I passed the "1000 new cards left" milestone long ago. Two months since I started this deck, I now have 800 left. Since it's downhill from here (and now that I actually know I'm going downhill) it's doing wonders for my motivation and self-encouragement. So I started with 3700 recognition cards and as I went through them, I deleted the obvious katakana words (about 200 of them) so at the moment I have a total of 3500 mainly Kanji/Hiragana words and katakana words are popping up rarely now. I have 800 left so that means I did 2700 new cards in about 60 days. Wow, that's an average of 45 new cards per day without missing any reviews whatsoever (average of about 200-250 reviews, about 300 on a bad day - by the way, these reviews include my custom cards which number about 1500). At the moment, I'm going at a steady 20-25 new cards so 45 sounds high. Perhaps I am mistaken about my starting date? 800 new cards left / 25 cards per day = about 30 days before I finish Core10k. I see no reason why that can't happen apart from things like death. First there was Heisig in 3 months (then a few crazy people apparently did Heisig in 3 weeks). Then there was Benny Lewis's "Fluent in Japanese in 3 months" (which he failed) Now there's Core10k (3700 cards) in 3 months. Good or no? I wonder how long it took me to do Core6k? I'll have to check my old posts on this forum. Anyone want to help me find out how long it took me to do Core6k? I know I made an announcement after I finished but I don't remember which thread. Core 10k - optimized i+1 version - Dustin_Calgary - 2014-05-28 qwertyytrewq Wrote:Based on my experiences, deleting one (EG production) also deletes the other at the same time. I dunno why. So I simply disable the production deck.I think deleting the individual flashcards amounts to deleting the fact itself, deleting associated cards. While browsing the cards of your core deck, you have the "fields" and "cards" buttons. "Cards" allows you to change the layout of the progression and recognition cards, but from here if you use the red X button to get rid of the production layout, all of those cards will be removed, while leaving your facts and recognition cards intact. Was one of the first thing I did with this deck. Core 10k - optimized i+1 version - Vempele - 2014-05-28 qwertyytrewq Wrote:Based on my experiences, deleting one (EG production) also deletes the other at the same time. I dunno why.Because you're deleting notes, not cards. If there's a way to delete individual cards, it's not in the documentation, but you can delete the entire card type in (review)->Edit->Cards... Core 10k - optimized i+1 version - qwertyytrewq - 2014-06-25 Dustin_Calgary Wrote:"Cards" allows you to change the layout of the progression and recognition cards, but from here if you use the red X button to get rid of the production layout, all of those cards will be removed, while leaving your facts and recognition cards intact.I only have about 100-200 until I finish 10k but I only now just got rid of them pesky production cards. So a long belated thanks for that. In other news, thanks to Core10k, I now know 5 different ways to say fish in order from when they appear in Core: 魚(さかな) 魚(ぎょ) 漁(りょう)This is more the act of fishing than the fish itself. Kanji is slightly different but might as well include it. 魚(うお) フィッシュ This is not in Core but might as well include it in this list. Apart from 漁, I can't tell the difference. I don't care either, for now at least. Then you've got the common fish names like tuna, mackerel, carp and whale (yeah I know, but it has the fish radical anyway). The end is near. Core 10k - optimized i+1 version - DrJones - 2014-06-25 さかな fish who has been fished. うお fish that lives underwater. ぎょ 'fish' type/genus. That's my guess. Core 10k - optimized i+1 version - MaxHayden - 2014-06-27 Could someone please explain, or link to a post explaining the mechanics of how this deck gets sorted to be i+1. (i.e. what happens during step 3 of Nukemarine's explanation above?) I'm trying to understand this in order to better align my KO-optimized RTK deck, but I don't really follow the logic that's being used here. For example, why is the third card 一般? Specifically, why does the sort put this vocab under 一 instead of 般? I would have thought from the description of kanji-sort that it would have done it the other way since 般 is the more difficult of the two kanji. Also, is there a working link for the kanji-sort.pl perl script used to generate this deck? What about for the ko2k1 kanji list and the longer 6k kanji list used as input? Finally, for the people who have done this deck, how big of a problem is semantic interference? (i.e. how often are you trying to learn several vocabulary words that are in the same "category" -- colors, types of fish, opposites like long and short, words with related or but subtly different meanings, etc.?) According to the research, morphologically-related learning helps, but semantically-related learning approximately halves the number of new words you can learn at once. I know this seems to be a fairly significant problem with the normal (frequence-based) core order, and I'd like to know how far this sort method goes towards fixing it. I would think that sorting this way would probably eliminate it almost entirely, but I haven't looked through all 10k entries. Is the problem mostly fixed by this deck, or is it still enough of a problem that it would be worth trying to think of a way to get the morphological benefits without the semantic interference? Core 10k - optimized i+1 version - Vempele - 2014-06-27 MaxHayden Wrote:For example, why is the third card 一般? Specifically, why does the sort put this vocab under 一 instead of 般? I would have thought from the description of kanji-sort that it would have done it the other way since 般 is the more difficult of the two kanji.There's no 般 for it to be put under - it's the only word containing 般 in Core2k. Core 10k - optimized i+1 version - qwertyytrewq - 2014-07-06 振る舞う When it comes to two verb compounds, shouldn't it be masu-form/verb-stem/nominalized verb form + plain/dictionary form? So 振り舞う、similar to 取り引き、書き直す、やり直す、吹き出す, 話し出す、飛び上がる、etc Or is this some kind of exception or is this kind of verb common and I simply have completely forgotten other verbs like it. Core 10k - optimized i+1 version - Vempele - 2014-07-06 It makes more sense if you think of it as coming from 振るう (shortening 振るい舞う). 広辞苑: (一説に、語源は「振ひ舞ふ」で、鳥が羽を振るい自在に空を舞うことという) Core 10k - optimized i+1 version - MaxHayden - 2014-07-07 So in another thread, it was pointed out that the Core lists used for this deck are not all that good in terms of frequency coverage. I contacted Dr. Tatsuhiko Matsushita at the University of Tokyo and spoke to him about this. He suggested using his newer list that's based on a large collection of Japanese books plus forum posts and other "casual speech" to make sure that you don't miss vocab that's used in less formal situations. (He also has a kanji list.) I'd like to take the cards from pmnox's two decks, incorporate Prof. Matsushita's frequency information, regroup according to the new frequency order, and then optimize-sort, etc. Unless someone here has a better suggestion, I'm going to redistribute the cards as well, so that there's a Core20k deck for getting to 98% coverage (and thus being able to read native materials) and a Plus deck that includes the extra 11k you need to get to 99% coverage (as well as anything left over from the Core and CorePlus decks that isn't in that 31k). So what I want to know is this: 1) The decks in this thread have a frequency field. So, does pmnox want me to try to merge the new information in a way that would make it easy for him to incorporate Prof. Matsushita's information into his decks as well? Or should I just do it in a destructive manner that might make it harder to back-port Matsushita's frequency rank and specificity information? 2) Could the people in this thread explain to me how the fields in the original Core and CorePLUS decks got filled in? It seems unlikely that these were all done manually and so I suspect that various pieces of software were used to insert furigana, grab pictures and audio, get example sentences and cloze them, etc. Is there a thread somewhere that explains how all of this was done? If not can people provide links to the proper resources? (Nukemarine seems to be well informed about this stuff, so maybe he knows.) I ask because there are going to be entries in Matsushita's list that aren't in these decks, and unless I get help, I'm not going to want to take the time to manually fill them in until I get to that point in my studies. And I'd like to avoid reinventing the wheel as much as possible while doing this. Core 10k - optimized i+1 version - pmnox - 2014-07-07 MaxHayden Wrote:So in another thread, it was pointed out that the Core lists used for this deck are not all that good in terms of frequency coverage. I contacted Dr. Tatsuhiko Matsushita at the University of Tokyo and spoke to him about this. He suggested using his newer list that's based on a large collection of Japanese books plus forum posts and other "casual speech" to make sure that you don't miss vocab that's used in less formal situations. (He also has a kanji list.)Hi MaxMayden, It's cool that you work on improving core list deck. First 10k entries were ripper from one of iphone applications. So audio that is in prerecorded. Also the examples that are in there are from that application. The list that I used for next 15k is a list of words that I got from another deck. I generated all examples and other data using scripts that I wrote. They are not generally accessible. I have never published them anywhere. I agree that the list above 10k isn't that good in terms of frequency, if I were to make this list again I would use a better list if I could. If you have any questions about deck let me know. The sorting order that I used isn't perfect. There are some kanjis that appear only once, something is popular words like ippan. I tried to avoid situations where at the end of each block of 2k words we would be stuck with only hard kanjis that have no connection to each other. To answer your questions: 1) Its up to you how you prefer to do this. My proposal would be to replace the list of words from 10k...25k with words from Matsushiba list that don't appear anywhere in the first 10 words. What is your goal? How long is the list of words in Matsushiba? Do you want to create the list from scratch or to extend the existing core 10k deck to include words from Matsushiba as an extension? 2) First 2k entries had pictures included. I generated pictures from ranges 2k-10k. As for audio it was already included for all cards in ranges 0k-10k. I also wrote my own tool to dump data from dictionary, to have good examples, etc. I wrote a pretty good heuristic algorithm that tries to figure out where the word is in the sentence. It works most of the time, but for words in ranges 0k-10k I already had that information. As for another columns I generated them using my own tools, list of kanji frequencies, etc. Unfortunately, those tools are hard to use and not available anywhere. Core 10k - optimized i+1 version - MaxHayden - 2014-07-07 pmnox, Thanks for your quick reply and for your offer of help. Ideally I'd like to avoid as much duplication of work as possible while improving what we have. I'm generally open to ideas as to the best way to achieve that. 1) What language are your scripts in? I do software stuff for a living so I could probably figure them out unless they are really obscure. 2) The Matsushita list is comprehensive (he has about 60k words in the main list and then a bunch more in the "assumed known" list and "narrowly ranging" lists). But if you cut it off at 99% coverage, it would have 31k words. If you cut it at 98% it would have 20k. If you cut it at 10k, you'd have 95% coverage. So how about I try to figure out how these lists line up with your two decks and then we'll decide what to do. (I'll also look at his "literary" and "academic" words lists since these give high coverage for some types of writing.) If his 10k is only a little different, then that's one thing, but if it's 40% different that's another matter entirely. (FWIW, I also contacted the publishers of the two printed graded reader series to see if they had a vocab list so that I could add tags for people who wanted to use them, but if they don't have the lists pre-made, I'm not going to include that information unless someone else goes through the books and makes the list for me.) 3) I'd be interested in any ideas you have that would make things better in general / that you would do differently if you had the time / etc. 4) I'd also be interested in your thoughts on the merits of using the KO2k1-based kanji list you used vs. using Matsushita's kanji frequency list to group the kanji into KO2k1-sized groups and then sorting inside those groups by RtK order (or some other ordering principle). Is there something special about the KO2k1 order or is it just an older frequency-based sort order? 5) Ideally, I'd also like to have a keyword/mnemonic set of fields for the vocabulary like we have for the kanji, but AFAIK no such information currently exists. So unless someone comes forward with that information, this is going to be one of those "it would be nice" sort of things... Core 10k - optimized i+1 version - pmnox - 2014-07-08 1) The code is written in Python. However, it's not written for readability. I'm not sure if it will be any use to you, but I can share a dropbox link to all files that I used to generate this deck. 2) Cool. How is your progress on learning Japanese? Are you an advanced learner or beginner learner? I heard that a lot of people stop using lists after learning 6k-15k words and then they just read native material. 3) I added all features that I wanted. I would probably have removed the production cards if I were to do this deck again. I don't find them useful anymore. I'm not even sure if studying more than 10k cards out of context is useful. When I see new word I simply enable the card from 10k-25k cards list, and I add it to list of all cards I learn. 4) I used K02k1, because they ordered kanjis in logical way. Kanjis like 0, 1, 2, ...,9 would appear together. Also kanjis that have similar meanings like left, right, etc. I didn't want to use RtK order, because I learned all kanjis before I started studying this list. I still think that trying to learn both new kanjis at the same time as well as reading is counter productive. At the very list my attempt to learn that way a few years ago failed. 5) Do want to have RtK mnemonics for all kanjis in each word? Or are you trying to do something different? |