![]() |
|
Study method, Unihan, CC-CEDICT - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Chinese (http://forum.koohii.com/forum-17.html) +--- Forum: Chinese and Hanzi (http://forum.koohii.com/forum-20.html) +--- Thread: Study method, Unihan, CC-CEDICT (/thread-13318.html) |
Study method, Unihan, CC-CEDICT - gdaxeman - 2012-10-02 Note: I wanted to organize my thoughts about the method I'm using now, sharing some ideas, and this is the result. Never mind the "article-like" formatting, it was just that having a big block of text was not looking good at all so I felt that I had to do it this way to make it look better and more readable. Keywords: Unihan 6.1.0, CC-CEDICT, Anki 2, study method. Download: Here's my huge deck! JimmySeal's method, reading fluency, Unihan I've been using a method similar to JimmySeal's and I can say it's very good one to improve reading fluency, although I think maybe it's better if you already have a good understanding of how the language works, the sounds of it and all, and not dive into it right away. Maybe, because that was my situation; JimmySeal's situation was that he was proficient in reading in Japanese already, so that can also be considered unusual. Anyway, there are some differences in what I do that can be of interest for others; for example, I've created a Unihan 6.1.0 deck, so now I just read things, mostly native materials from the Internet that can be easily selected and copy/pasted, and unsuspend the hanzi that I find interesting, usually characters that I don't recognize right away or that I don't remember the readings. This way, just like JimmySeal, the characters I have for review are all relevant for me, it's not just a random list. A personalized list with things that you've seen in materials that you are interested in can be very helpful to make your brain treat it as some sort of useful information, not as a 'useless' one that it wants to discard right away. So, by having this Unihan deck all set now, I don't have to add any field manually, which is a big plus. Also, Unihan works so perfectly as an Anki deck, as it has all the nice fields such as kMandarin (the most common reading), kXHC1983 (Xiàndài Hànyǔ Cídiǎn readings), kSimplifiedVariant/kTraditionalVariant (really helps when studying both sets of characters) and kDefinition (the core definition in English for the character in Modern Chinese). I also added fields with frequency data for both the Simplified and Traditional characters, which I use for sorting. CC-CEDICT, sounds and mined deck In addition to that, I also have a CC-CEDICT deck that I use when the character has a different meaning or reading when it's part of a compound word. Adding these two decks, there are almost 180,000 cards; Anki 2.x deals very well with this, while 1.x was not even a choice, too slow for big decks with many fields, at least in my computer. I also have sounds for all sylables used by the hanzi, except for most neutral tones, and I use TTS for the words, played in real-time, so I don't have to generate and deal with thousands of mp3 files. The TTS for Mandarin I use is surprisingly good, better than in the ones I've used, or still use, for other languages. And CC-CEDICT is a really good fit for Anki for multiple-character words given the way the entries are formatted, all in a single line and as short as possible. It fits better than other commercial dictionaries I've seen (Wenlin, for example, is the top choice for many things but it doesn't fit well for reviewing with Anki.) I also have a Mined deck where I add words that are not on Unihan/CC-CEDICT, such as many names — 奥利维亚·巴勒莫 [Olivia Palermo], 莫妮卡·贝鲁奇 [Monica Bellucci], and 辛德瑞拉 [Cinderella], this kind of thing — and words that who knows why were never included in any of the dictionaries I have. English definitions The definitions are all in English as I mentioned already, not monolingual, but I think it really doesn't matter. Leave the monolingual interpretation for when you're reading real native stuff, and bilingual for when you don't understand a word (or many words.) Sharing? I've been thinking about sharing the Unihan+CC-CEDICT decks, but I've been procrastinating for too long and I just lost momentum. I would have to make some changes to make it more ready for consumption by others (and I don't know how to deal with the TTS and media, which are not free but I feel it really makes a difference; also, the TTS in Anki 2 for now needs additional configuration to work correctly, such as changing locale to Chinese), and it doesn't seem there are people in here that could be interested and which are using Anki 2 to study Chinese... I could be wrong, though (waiting to see if there is anyone who says that I'm wrong, or else I'm right, right?) Memorizing and reviewing I should add that memorizing and reviewing things this way, with isolated characters and trying to recall the most common reading and at least a gist of the meaning, can be really mentally demanding (and yes, I certainly know about sentences because it's what I used before, but I prefer this way now.) I've tried something similar when I started studying Chinese and it simply didn't work at all, but now that I have many more 'hooks' to use things are much better, which is why I mention the importance of knowing at least some amount of Chinese. There are a lot of mnemonics involved, because pure brute force doesn't work well except for certain things, such as very often used words and such. Heisig I don't try to recall nor do reviews based on Heisig's keywords anymore, but I believe the method was one of the most helpful things I've ever done. Now I use a variation of the method that I've come up with for the readings, but it's really hard to explain it as a whole because the part that really works is the mental image, now with sounds, plus practicing it a lot, not just hearing another person talking about the basics again. Actually, I had read about the underlying method that Heisig uses in a book about improving memory much before I even started studying Japanese (which was before Chinese; and it was a really old book, I believe much older than the first kanji book by Heisig.) It's one of the methods employed by people who can show those "amazing feats of memory" — there are others, but all of them tend to be based on mental associations —, and it's all about the mental image, no need to write stories down or anything like that, unless you want to share them, sure. Writing them down can be detrimental because it slows you down and it makes you keep stories that don't work well. The way I do now is that, if I don't remember what I want to remember with the mnemonic "story" I had (in quotes because it's not actually a story, it's more like a short animated GIF), I just create a new one on-the-fly. Study method, Unihan, CC-CEDICT - Dunki - 2012-10-02 Hello gdaxeman ! Thanks for sharing your ideas. These are very detailed and very interesting. gdaxeman Wrote:I've been using a method similar to JimmySeal's and I can say it's very good one to improve reading fluency, although I think maybe it's better if you already have a good understanding of how the language works, the sounds of it and all, and not dive into it right away.Again thank you for being "honest" and warn us about this potential but genuine pitfall. I also want to dive into native materials as soon as possible but let's face it ! I don't even know what is a sentence, know absolutely no words beyond the one-Hanzi words given by Heisig. The way I have kick started my Chinese studies, was to begin with the Assimil method. I don't know if it is widespread but I think this method is quite solid if you add the power of Anki on top of it. Very efficient and not as painful as I had imagined first. gdaxeman Wrote:In addition to that, I also have a CC-CEDICT deck that I use when the character has a different meaning or reading when it's part of a compound word. Adding these two decks, there are almost 180,000 cards; Anki 2.x deals very well with this, while 1.x was not even a choice, too slow for big decks with many fields, at least in my computer.I also have a vocab deck not as focused as yours but with pretty much the same fields you mentioned. But what impressed me (and also depressed me) is that you are talking about 180K cards !!! It took me a year to complete RSH1&2, I can't imagine aiming for Chinese fluency (kind of^^) and have to deal with such metrics. Because if you already have 180K cards and still continue to add new items then you are still not satisfied with your Chinese proficiency level. That's what makes me scared! gdaxeman Wrote:..., and I use TTS for the words, played in real-time, so I don't have to generate and deal with thousands of mp3 files. The TTS for Mandarin I use is surprisingly good, better than in the ones I've used, or still use, for other languages.Would you mind to detail the TTS software that you are using? What are the TTS issues that you encounter with Anki? You spoke also about copyrighted medias: what have you used? gdaxeman Wrote:..., and it doesn't seem there are people in here that could be interested and which are using Anki 2 to study Chinese...I am currently using Anki1 but as soon as Anki2 becomes vanilla, I plan to jump in. He he! Maybe it's the time for someone to post for an how to upgrade from Anki1 to Anki2... (Follow my glance...^^) I am totally relying upon Anki for my Chinese studies. Right now everything I study, must be put in Anki. I know it's harsh but it's my price for being fully autonomous. Later I hope to do less Anki reviews and tackle more native materials (but I'm nowhere near at the moment!). Furthermore, I am not sure to be interested in your deck because I have only recently discovered that a deck must be built piece after piece and with your own pieces (cards?). And finally, I am more inclined toward the way that you have built it (this is what you explained). gdaxeman Wrote:I don't try to recall nor do reviews based on Heisig's keywords anymore, but I believe the method was one of the most helpful things I've ever done.I can only second you on this one, gdaxeman. gdaxeman Wrote:Writing them down can be detrimental because it slows you down and it makes you keep stories that don't work well. The way I do now is that ..., I just create a new one on-the-fly.This is exactly what I suffered and I answered it like you did! Just leave the story that do more harm than good and go create a new one. But you have to experience it first before jumping over the obstacle. The huge volume of all the things left to do is what pushes me forward, and this is also my answer to those obstacles. Straight to the point ! (Occam's razors) Regards, Dunki Study method, Unihan, CC-CEDICT - gdaxeman - 2012-10-02 Dunki Wrote:... But what impressed me (and also depressed me) is that you are talking about 180K cards !!! It took me a year to complete RSH1&2, I can't imagine aiming for Chinese fluency (kind of^^) and have to deal with such metrics. Because if you already have 180K cards and still continue to add new items then you are still not satisfied with your Chinese proficiency level. That's what makes me scared!No no, 180,000 is the total number of cards in the deck but you don't really need to know them all to reach reading fluency (or, better yet, complete proficiency.) 41,406 of these cards are words with more than two characters, which often can be broken into multiple two-character ones without causing any problems in understanding them, or even into multiple single-character ones, unless it's a specific idiomatic expression very different from the ones in English, or when it's a name. That's where most of the variance in the number of entries in Chinese dictionaries often come from, by the way — some simply don't include much of these types of entries. The Unihan part also has a lot of characters that are not used in modern Chinese, and it offers the simplified and traditional variants in independent cards (when they're different), which makes the numbers bigger too. The way I use it is by unsuspending the cards that interest me as I see fit while I read things from native sources, and things go very well that way. I don't know if I will ever unsuspend all of them so that they are all included in my reviews; it could happen (who knows), but it would take more than 12 years by adding 40 new ones a day to be reviewed. Sharing So this is what I mean when I think about sharing the decks: it would be for those who were after a "dictionary deck" that had most of the things they would find in their reference materials, so that they can simply unsuspend things they find interesting to review, put their own additional information in an "Additional Info" field, and then, for the most part, they just have to add what's missing, such as names and certain types of slang. I'm sure there would be some people out there interested in this, just like I was; I'm just not sure if most who think they want something like this would benefit much because, in my experience, simply having a list of words created by others is usually not that helpful. Even then, maybe I will put it into Anki 2's shared deck after the final release version comes out and then people can do whatever they want with it. Sentences An additional information is that I've recently (many weeks ago) deleted all of my Chinese 'sentence' decks, which had around 7,000 unsuspended cards I used to review, and started using this more focused method exclusively. You can reach that same number in less than 6 months by simply adding 40 new cards a day without missing, but I feel that I could have done this change to the method I'm using now sooner. Anyway, I believe graded sentences, such as the ones in the Smart.fm deck, ChinesePod, Pimsleur or Assimil, can be very helpful to get started. It's just that I feel there's no need to keep reviewing them forever and ever. Unless you want to. Mining and reviewing sentences also can be really time-consuming, and they either get too repetitive or too long — which go against the optimal method of using an SRS, that is, the minimum information principle mentioned by Supermemo. Reading is much more fun, and now I only use SRS to strenghten the specific pronunciation(s) of the character, with tones and all, plus refresh its meanings in my mind, so that I'm able read it all in the most accurate way possible and without depending on having a specific sentence pattern to remind me of what the word or character is all about. In a sense, one (SRS) complements the other (reading). One without the other would be like walking somewhere using only one leg — you might arrive where you want eventually, but it will take much longer and will be more tiring. TTS The TTS I use is Neospeech's Liang, a male voice, which is perfect for compound words (I really mean words, not sentences.) After real natives speaking with good accents — the best of the best —, this is the one to go if you want to hear a voice that wouldn't make you sound too horribly uncomprehensible for native speakers if you ever tried to copy it (with a human touch of course, not 'breaking' the voice as TTS engines sometimes do.) For sentences it sounds ok, but it doesn't have "emotion" and doesn't use intonation depending on context as humans do, so if you copy it faithfully you may end up sounding really weird. Not that I suggest doing that, though; I'm just mentioning how good it is for compound words in Anki. For individual syllables, I use the ones from the ChinesePod Pinyin chart, which sound very natural, but they are all spoken by women with a soft voice. I also have a male voice from another source which is good but I would have to make some trimming for it to work better in Anki, such as removing the white space in the beginning, but haven't done that yet and it's a little bit of work as I couldn't do it in batch without having to fix anything later as I did with the ChinesePod ones. The pinyin syllables are all from 'free' sources (as in beer; you can hear the original ones in the website, or download them, without problems), but I don't know if they are free for distribution in another type of work. Actually that's probably not much of an issue if you come to think about it, but it would be nice to have a deck that is really 100% free and unencumbered. The issue with TTS in Anki 2 is that the voice I mentioned doesn't work with the plugin if you don't change your locale to Chinese; it simply doesn't say anything at all [update: this was fixed in the latest version.] Also, in the current version you have to change some settings in a text file to make it play automatically (works much better with the deck), and some people break things when they change configs in text files so they prefer never having to do that. I suppose. Nevertheless, it's the greatest plugin ever for TTS with SAPI voices in Anki 2 (I'm talking about AwesomeTTS, by the way.) Study method, Unihan, CC-CEDICT - gdaxeman - 2012-10-03 To give some examples of how I format the Unihan cards: ![]() The question shows the hanzi only, then I have to remember the most common pronunciation, and at least think about the meaning (not necessarily in English; it can be just a 'feel' for the Chinese word.) In the back it shows the answer with audio, plus definition, all the readings defined in the kXHC1983 field, and the traditional / simplified variants. The deck actually has 40 fields but I don't use all of them for reviews, they're there mostly for curiosity (e.g., Hanja readings in Hangul, which don't matter to me much as I'm not studying Korean.) Study method, Unihan, CC-CEDICT - Warp2243 - 2012-10-04 Hi gdaxeman, thank you for sharing your study method and your deck organization. I would be very interested in such a deck in a near future, but not exactly now. I've been learning Japanese for a little over 1.5 years, during which I did RtK1 and then the whole Kore 10,000 (from Smart.fm + the JSensei iPhone app). Taking this as a starting point in my Japanese studies and seeing how insanely efficient this method has been, I was thinking I would continue to rely heavily on sentence mining up to fluency. I added about 1,000 Japanese sentence cards, mined from dictionaries (almost 100% of the time these 3 sites do the trick : Tangorin (JMDICT), SpaceAlc, Naver). Now I started Chinese from scratch 3 weeks ago, crammed the Smart.fm Chinese Core deck of 5,350 sentences (which made me realized how much faster I should have been for Japanese...), and now I'm a little busy with the 600+ reviews a day (but that's totally manageable, see further). I modified the deck along the way to make it look like the Japanese one, by that I mean I added a word in Chinese + pinyin + English translation in every sentence, trying to chose unique words as much as possible, and thus making the deck into a full dual word+sentence cards deck. I originally did this because I like to track precisely my vocabulary count, but right now it's being very useful to do the huge daily reviews (I'm reviewing it as a vocabulary deck, and will go back to sentences when it will have decreased to 100/200 sentences a day).So regarding your topic now. Like you said, I believe it's important to have some familiarity with the language before going full vocab decks, no sentences. I can't imagine myself having learned words only for Chinese, and even less for Japanese whose grammar is much less obvious at first. But right now I'm starting to think that I could deal with word-only cards in Japanese, and still memorize them quite well. Not completely sure about this since I haven't tried it yet, but I'm really considering it right now. If it works as well as for you, I'll also do the same for Chinese in a few months. So at that time I'd be interested in your decks, or even raw data (I can arrange it myself, you wouldn't need to prepare/polish it or whatever I think). I'm also interested in the kanji/hanzi data you have (and that would be right now). Currently I have a 6,400 characters deck, I think it comes from KanjiDIC (imported it from a Google Docs found in this forum). It has several English meanings, RtK meanings, most if not all ON and Kun readings, examples words, stroke count, KanKen level, and obviously all the stories that I wrote myself (in French). Until recently I've been satisfied with it, but now that I've started Chinese it's not enough, I'd like the following : (1) a LOT more characters : Japanese-wise it was sufficient, but now I've been adding a few hundred cards for Chinese-only characters as I encounter them, and obviously I'm only inputting the essential information in those (English keyword and mnemonic story, no pinyin, stroke count or whatever). I'd like to import right now about ALL characters, kanji or hanzi, that I could ever find in my life, which would probably make it a 15k-20k characters deck at least. That way I'd just have to unsuspend. (2) once the characters are all in the deck, I want to import the pinyin readings too. If possible I'd also really want the pinyin AND the character to be colored according to tone (see Pinyin Toolkit). I realize I've memorized the tones of over 75% of the 2,700 hanzi in the Core deck so far, just by passively seeing the colors, which I find insanely nice. (3) quite important too, and you mention it's included in your deck: I want to learn both simplified and traditional systems, by that I mean I want to read simplified (don't really care about writing, even from memory), and read AND write traditional (in my characters deck I add ALL traditional hanzi that do no exist in Japanese or that I've not encountered yet, and I make mnemonics for those specifically, see picture below). So I'd love it if I could see the triples Kanji/Trad./Simpl. when reviewing the characters, to strengthen the association between the three. (4) I'd like the Korean readings too, since I plan to learn in it one or two years, and from what I understand, you can heavily rely on hanja for this (which will make it a piece of cake, just like it has been for Chinese...). (5) I'd also like to import all the usual character-related data : frequencies, English keywords (like on your cards), stroke count, and I want to do this for all characters in one go, so that I won't ever have to bother with this for the coming years. Well, in short, I want the ultimate character deck for someone planning to learn all CJK languages up to fluency. And you seem very close to own something like this. By the way here is how my decks are formatted (and an example of how I would like the tone coloration and the kanji/trad/simpl variants in the character deck) :
Study method, Unihan, CC-CEDICT - gdaxeman - 2012-10-05 That's right Warp2243, I believe that sentences are better for those who are starting, even if they focus on one word in each sentence as you have done, and it can give a very good foundation in the language depending on how they deal with the rest of the sentence. Then you can progress to adding only single-hanzi and isolated vocab cards for more efficiency, because sentences become too long and too repetitive to be reviewed all the time as I mentioned, and you see the same patterns all the time when reading with the only difference being the vocabulary. I mean, that at least for the sentences added for the sole purpose of studying the sentences themselves; it's definitely possible to use an SRS with the language as the means to learn something else, and that's another thing entirely (e.g., studying about history, technology or medicine in Chinese, something really nice to do.) Obviously, when the pattern of "only the vocabulary is different" doesn't hold, you can add the entire sentence, or the fragment of it that is different which you want to remember. It's a very flexible system. Quote:I'm also interested in the kanji/hanzi data you have (and that would be right now).So the Unihan is the deck for you! It has 75,619 characters, all the ones that you will find in Unicode (unless it was missing when importing, not discarding that option for the less common ones.) You even have to use a special font — I use HanaMinB —, or else all of the less-common characters appear as square boxes. 20,926 of the characters have the field kDefinition filled, and that's pretty much the entire amount of characters that are used in Modern Chinese. 20K is not much if you think about it, and many of these are even very situation-specific and rarely used most of the time. 41,208 of them have the kMandarin field filled, but they don't have definitions if they fall outside the previously mentioned 20K list. So, 20,926 characters for Modern Chinese and that's all, including both simplified and traditional forms... but now to scare (or to interest) you, if you dive into Classical Chinese with the original texts, then this number can become much huger because there has been a lot of variation in how to represent a character across the centuries, even if they mean exactly the same thing; it's like the Simplified/Traditional thing, but which spans across millennia. Classical texts can even present characters that have never been encoded in any computer format yet, but if you reach that stage then you would probably have access to other sources you could add to the deck, I suppose, maybe all of them entirely in Chinese. I'm not into this Classical thing (yet, at least) so I don't know what could be done in that case, but I guess the very same structure could be used. Anyway, this deck includes all the Japanese characters you would want too (I guess), but they are not referenced in relation to the Chinese ones so you would have to add that information if you really wanted to used it. There aren't that many characters that are Japanese-specific variants, maybe less than 300 or so — not sure —, so it wouldn't be too much work to reference those, but it would have to be all done manually. The CC-CEDICT deck lists 206 characters that have "Japanese variant" in the definition, so you can use that as a source for this info. Also, if you're interested in knowing if the character used in Japanese comes from the Chinese's traditional or simplified form, you would have to use another thing to create that reference, maybe a frequency order list compiled from texts written in modern Japanese, then do some more work to organize that into the cards. The deck pays more attention to Chinese, but since the character field is unique, you can use it to import anything into it. The only thing to notice is that [technical content ahead], at least for now, you have to use the UTF-8 field (encoded) as the reference field, with the contents all in uppercase, because Anki 2 doesn't update cards correctly if you use the Han field (decoded). Wenlin plus Notepad++ help with that if you don't already have the decoded form in uppercase in that which you would like to include, but if you do then it's easy. Also, it has the pinyin for sure, including variants, but it doesn't color-encode them so you would have to use another thing to do that, such as the Pinyin Toolkit you mention, or a custom script. It has the Korean readings (kHangul and kKorean fields) and the Japanese readings (kJapaneseKun and kJapaneseOn, but these are in romanized form and I don't know if they are complete.) It also has the Vietnamese readings in Quốc ngữ, but I'm yet to see anyone studying Vietnamese and who would like to know this, given that Vietnam has abandoned Chinese characters even earlier than South Korea. Anyway, it's fun to have it and it was already in the same Unihan Readings file. Quote:(5) I'd also like to import all the usual character-related data : frequencies, English keywords (like on your cards), stroke count, and I want to do this for all characters in one go, so that I won't ever have to bother with this for the coming years.You will have most of this all set! For Chinese I mean, as I mentioned about the focus the deck originally has. Then it's just a matter of importing additional fields, which in Anki 2 is very fast and easy to do, such as Japanese frequency orders, example sentences, Heisig info and stuff. Quote:Well, in short, I want the ultimate character deck for someone planning to learn all CJK languages up to fluency. And you seem very close to own something like this.Yeah, the characters are all there; Unihan hardly misses a character used in modern contexts — and, if it's missing from Unihan, most people would never be able to see it in a computer because of the lack of support in fonts that come with the operating system. So the remaining part concerning the languages themselves are just a matter of compounds, language-specific words and meanings, and grammar. When it comes to reading, at least. Quote:By the way here is how my decks are formatted (and an example of how I would like the tone coloration and the kanji/trad/simpl variants in the character deck) :I see. I believe you could use Pinyin Toolkit for that in this deck too, for the characters it recognizes (but: Pinyin Toolkit hasn't been fully ported to Anki 2 yet, though the developers said they're working on it.) I don't know how it chooses how to colorize the characters that have multiple pronunciations but I see it uses CC-CEDICT and Unihan too, so if the coding is right, it will be mostly the same thing. If it's not right, then it will have some mismatches (some people have complained about something related in older threads, about it choosing the wrong tone for a character and such, but I don't know how frequent is that.) Release date Now to confirm it, I will be releasing this all in the Anki 2 shared decks, if it's accepted in there, a little after the final version of the app released, because by then many people will probably be using the app and it will be only a matter of clicking in the downloaded deck to use it, no need to deal with any type of configuration (aside from the TTS thing for the CC-CEDICT deck, if they want.) Then everyone will be able to have a deck with all the hanzi in Unicode, all the time. Study method, Unihan, CC-CEDICT - bflatnine - 2012-10-05 No real comment on your method, but I'd advise you to be careful about viewing the simplified/traditional thing as a simple 1:1 conversion. It's far from that. For instance, in that one image you posted above, two pairs of characters are wrong. 鬥 is not simply the "traditional version" of 斗. They're two different characters. 斗 means the big dipper, while 鬥 means to fight or struggle. So you have 鬥牛 and 斗牛, which are two different things (the first means bullfighting, the second refers to the Big Dipper and Altair constellations). Unfortunately, they're both written 斗牛 in simplified Chinese. Similarly, 颳 and 刮 are two different characters in traditional Chinese. The first means to blow (referring to wind), while the second means to scrape or shave. So you have 颳風 (to be windy) and 刮鬍髭 (to shave one's beard). Hope this helps. Study method, Unihan, CC-CEDICT - gdaxeman - 2012-10-05 Yes bflatnine, that's one thing people should have in mind when using those fields. The traditional/simplified variant, when available in a card, should be used mostly as a point of reference, for creating some sort with familiarity with the variant character, knowing that it doesn't take meanings and context into account. That is, the field shows that a variant form exists, even if the change depends on context, not that it can be converted directly without any problems. With most characters there are no problems, but not with all of them. In a way it's interesting to know this in order to understand better how character-by-character conversion tools work — which are the most common type of conversion tools available for Chinese, so people using those will often face this multiple mappings issue, even more if the conversion is from simplified to traditional. Unihan warns about this in the documentation: Quote:The kTraditionalVariant and kSimplifiedVariant fields are used in character-by-character conversions between simplified and traditional Chinese (SC and TC, respectively). For any character X, when converting between SC and TC, there are four possible cases:If the person wants more accuracy in this regard, he should use the CC-CEDICT deck rather than the Unihan one (considering what's being offered in this thread only), as it takes meanings into account when grouping the traditional and simplified pairs. I also suggest people not to become too reliant on the English meanings, thinking they are all-inclusive for all situations, because they're not. They never are, even in the biggest dictionaries you can find, in any language. In Unihan's case, they use very short definitions, which work better with an SRS than the big blocks of text that more inclusive dictionaries tend to present — these are better left as reference —, and then you can complement them with your own additional information if you want. Piece by piece. That's how people learn things and gain experience, by starting small and then expanding on that knowledge. Study method, Unihan, CC-CEDICT - gdaxeman - 2012-10-05 Anki 2.0 was officially released! So, as I promised, here's the Unihan and CC-CEDICT deck: → Unihan 6.1.0 and CC-CEDICT decks for Anki 2 ← In the end I decided to place it on Google Drive rather than AnkiWeb, at least for now. To use the deck, simply download it, press import (File > Import [Ctrl+I]), add it to your profile and there you go. Size: It's a big deck, with 179,351 cards, 30 MB when compressed and and 63 MB when uncompressed. It was made with Anki 2, so it doesn't work in version 1.2. Fonts: I suggest installing both fonts from the hanazono-20110915.zip file (HanaMinA.ttf and HanaMinB.ttf; the latter is more complete) so that you are able to see all the characters from Unihan, or else many of them, the less common ones, become invisible or are displayed as square boxes. Description: Deck created from the Unihan 6.1.0 and CC-CEDICT databases. Not all fields from Unihan were used, but everything that could be used for studying Chinese with an SRS is there. Notice that, since the content for the Unihan deck comes from the Unihan database, there are a lot of fields that are empty, even relevant fields such as kDefinition (definition in English for modern written Chinese) and kMandarin (most common pronunciation), but in these cases it's mostly for characters that aren't used in Modern Chinese at all. Suggestion: Suspend everything and then unsuspend the characters and compound words that you find interesting to review while reading things in Chinese (anywhere, including textbooks, subtitles and 暴走漫画.) You can also use the frequency order fields to see if you are missing something you are familiar with and that you'd like to include in your reviews to create stronger memories. In my opinion, the Unihan deck is better for reviewing single characters and the CC-CEDICT deck is better for reviewing compounds. The latter one can have duplicate characters on the front if the reading is different, or if it has multiple simplified/traditional variants, so in these cases you need to add something for disambiguation or else it becomes hard to know what the expected answer is. Also, the Unihan deck has sounds for the syllables, except for the neutral tones. |