![]() |
|
kanji recall rate statistics - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: Remembering the Kanji (http://forum.koohii.com/forum-7.html) +--- Thread: kanji recall rate statistics (/thread-249.html) |
kanji recall rate statistics - laxxy - 2006-11-02 Combining the dataset on kanji review failure rates kindly provided by Fabrice with the kanji data in Edict, I constructed a dataset on the RTK1 kanji, consisting of: Code: obs: 2,042 First, some summary statistics: Code: . su sfc ssc str tt frequency grade strokecountCode: . cor sfc ssc str tt frequency grade strokecountCode: +-------------------------------------------------------------------------+Code: +--------------------------------------------------------------------------+There are quite a few kanji with a high Heisig number here, especially in the top 10. It could be due of Dr. Heisig leaving the harder kanji for later in the text, or perhaps people just haven't learned them well enough yet, and there can be many other explanations: Code: .395062 + It is possible that those would be the kanji that could benefit from better stories the most. First, let's fit a simple linear model: Code: RECALL = b0+b1*FRAMENUM+b2*GRADE+b3*STROKECOUNT+b4*FREQUENCY+epsThis model misses a few kanji, due to some them either not being assigned either a school grade, or not appearing in the newspaper frequency dataset. I introduce two dummies (GRADEMISS and FREQMISS, equal to 1 for the missing observations and 0 elsewhere) to account for those. In addition, I add two more variables -- dummies for the kanji in Part 1 and Pt. 2 of the Heisig book (the first part has detailed stories by Dr.Heisig, and the 2nd has less detailed ones. iirc, first part is also available online): Code: RECALL = b0+b1*FRAMENUM+b2*GRADE+b3*STROKECOUNT+b4*FREQUENCY+Dropping Pt.2 from the model, here are the estimation results: Code: . regress tt framenum grade0 grademiss stroke freq0 freqmiss pt1Code: +-------------------------------------------------------------------+As we can see these are somewhat more evenly distributed over the book length: Code: .395062 + Probably, the best thing would be if we could think of what stories we ourselves used there, whether they helped, and if they did, whether they could be shared with the community. For comparison, here is also a list of 10 easiest kanji after controlling for the factors in the last model: Code: +---------------------------------------------------------------------+kanji recall rate statistics - leosmith - 2006-11-02 Cool post laxxy! A couple of comments. I found it interesting that "one" didn't make the easy lists. As for the difficult lists, I find the 20 in your first list harder to remember than the ones in your second list. I finished book one almost a year ago, and have had all the flashcards in supermemo for about 8 months now. I'm not judging you adjustments, but I thought you might want to know. I found it very frustrating that I couldn't get my 90% on either of the hard lists, so I believe these really are some of the hardest. Thanks a lot - this stuff is really interesting to me. kanji recall rate statistics - laxxy - 2006-11-02 leosmith Wrote:Cool post laxxy! A couple of comments. I found it interesting that "one" didn't make the easy lists. As for the difficult lists, I find the 20 in your first list harder to remember than the ones in your second list. I finished book one almost a year ago, and have had all the flashcards in supermemo for about 8 months now. I'm not judging you adjustments, but I thought you might want to know. I found it very frustrating that I couldn't get my 90% on either of the hard lists, so I believe these really are some of the hardest. Thanks a lot - this stuff is really interesting to me.That's a very good point, I am actually not sure which list could benefit more. I wanted to make a small poll on a non-overlapping section of the two lists at first. For me, too, the first list seemed harder, but I thought it could be just because I was at the end of the book myself. Since it's been a while for you, that was probably not the reason. Which one could be more affected by new stories, I am not sure -- it would be interesting to see if there are any differences in stories for kanji on top and in the bottom of the 2nd list (the 1st one would not be good for such a comparison). I also wanted to check how different things affect the recall rate. kanji recall rate statistics - laxxy - 2006-11-03 One other surprising character for me was {315}, whale, in the last list -- I was having a lot of problems with this character, and couldn't recall it correctly even now, the problem being the ordering of primitives, as 'capital' usually appears on the left. kanji recall rate statistics - leosmith - 2006-11-03 laxxy Wrote:I wonder, which of the two 20-most-hardest kanji lists feels harder for you?Hey, I just thought of something....I'm pretty sure supermemo will tell me my best & worst kanji. I'll try to check this out tonight. Can any other supermemo users shed some light on this, and perhaps share their best & worst? Also, I wonder if twinkle & mnemosyn (sp?) keep track of such things? kanji recall rate statistics - Tatiana - 2006-11-03 Maybe explanation is so hard to remember because it's a simplified form? Originally, it was written as 釋, and the etimology of this character, according to Zhongwen site, is "Differentiate from criminals uinder watch." It's interesting that a lot of people's stories have something to do with criminal inestigation despite the shakuhachi and the animal tracks primitive meanings. I would be interested in knowing how many of the hard-to-remember kanji are simplified. kanji recall rate statistics - ファブリス - 2006-11-03 Nice work, laxxy. Thank you for posting it. Regarding "whale" the ordering of primitives/radicals is based on the "principal meaning", the component indicating the principal meaning gets the prominent position. For example you get lot of state of mind kanji with the heart as main component (the "state of mind" primitive is actually the heart compressed to the left, if I am not mistaken). If I recall well, most kanji have secondary primitives as phonetic components anyway, however meaningful they can seem at times. With "whale" the only component there that originally had any meaning value is the "fish". And so you get the "fish" primitive prominently on the left for all kinds of fish and aquatic mammals like carp, shark, fin, sardine, killer whale... 鯉 鱶 鰭 鰯 鯱. So in a way it works just like all the tree kanji, the flower kanji, the food kanji etc. I think the "easiest" kanji list shows kanji that have a high imagery value (see the Mnemonic Effects of Imagery). In the "most difficult" list we see the opposite : keywords with very low imagery value. Even "summit" is somewhat abstract since it can be used in many contexts tangible or intangible. Interestingly "explanation" and "recommend" are two characters which I have no difficulty to recall. I updated my story for "recommend" in the Study area is I realised I had left out the prominent character in the form of Uncle Jimbo from South Park series. He is a deer hunter, and obviously that helped me remember the keyword easily. Similarly for "explanation" I use the character "Lieutenant Columbo" who always finds an explanation at the end of the episodes of the TV series. This got me thinking that to display a list of the top difficult kanji (perhaps up to the top 100) would help concentrate community work on those stories, and it could be really interesting over time to see how with our common effort that list would update and how the kanji would move position. To push this idea further I realised that I could also display a list of the most failed kanji of the last day, based on the last day's reviews accross all members! I did a simple database querry like this this week and found that I will need to update the database, because currently the hit/miss counts are counted from the very first review, regardless of the date. This shouldn't be too difficult. I think this could help even more concentrate shared work on the stories that are most needed. kanji recall rate statistics - laxxy - 2006-11-03 ファブリス Wrote:Nice work, laxxy. Thank you for posting it.That's a very good point, Dr. Heisig should have noted this in the book. He only has 'carp' in RTK1 (which I incidentally also had trouble with), and the 'fish' primitive is not quite as strong in other characters to pick the pattern. Quote:I think the "easiest" kanji list shows kanji that have a high imagery value (see the Mnemonic Effects of Imagery). In the "most difficult" list we see the opposite : keywords with very low imagery value. Even "summit" is somewhat abstract since it can be used in many contexts tangible or intangible.I agree, that has generally been the case for me too. One notable thing is that I do not see kanji from Lesson 52 (which was mostly irregular shapes) in the most-hard-to-remember lists, and I haven't had much trouble with them either. I'll test this. Perhaps visual memory is a fine tool when used in a limited way, and one should feel more free using it for harder kanji. Quote:This got me thinking that to display a list of the top difficult kanji (perhaps up to the top 100) would help concentrate community work on those stories, and it could be really interesting over time to see how with our common effort that list would update and how the kanji would move position.Yes, that would be interesting. Quote:To push this idea further I realised that I could also display a list of the most failed kanji of the last day, based on the last day's reviews accross all members! I did a simple database querry like this this week and found that I will need to update the database, because currently the hit/miss counts are counted from the very first review, regardless of the date. This shouldn't be too difficult.Yes, I think that would be interesting to see and it could attract more story writers where their attention it is needed. If you also record story adding/revision times, one could even cross-reference this with the kanji failure database and do an analysis of the type Pepe is interested in. kanji recall rate statistics - CharleyGarrett - 2006-11-03 I was sceptical about this topic, and I don't really get the math, but the list of hardest kanji pretty well matched the ones that I found hard, and the ones that were easy, I also found easy. It could indeed be fun and valuable to have a list of the hardest kanji, so we could collaborate on providing more effort on creating good stories. I know there were many times when I was having trouble, and then somebody came up with an unusually creative story that just really made it easy. Sometimes it just takes some creativity to break the wall down. Thanks, laxxy! kanji recall rate statistics - laxxy - 2006-11-03 CharleyGarrett Wrote:I was sceptical about this topic, and I don't really get the math, but the list of hardest kanji pretty well matched the ones that I found hard, and the ones that were easy, I also found easy. It could indeed be fun and valuable to have a list of the hardest kanji, so we could collaborate on providing more effort on creating good stories. I know there were many times when I was having trouble, and then somebody came up with an unusually creative story that just really made it easy. Sometimes it just takes some creativity to break the wall down.You are welcome. I wonder, which of the two 20-most-hardest kanji lists feels harder for you? Where did you have more trouble inventing good stories, on average? Dropping repeated kanji, it would be Recommend, Sew, Appear, Entrust, Remorse, Reputation, Suspend, Affinity, Praise, vs. Income, Rejoice, Hope, Achievement, Salvation, Envious, Adroit, Full, Quantity? kanji recall rate statistics - cbogart - 2006-11-06 I wonder if kanji like "1" and "2" don't show up as easiest because they're the first ones in the book, and people that drive by the site to check it out may just be pressing buttons to experiment. kanji recall rate statistics - PepeSeco - 2006-11-06 laxxy Wrote:Pepe might also be interested in checking if he can find any systematic difference between the stories that are available for these 'best' and 'worst' kanji.Great work laxxy! It is really interesting. I of course see the same problem now that I mentioned in the other thread, namely that without knowing what stories people are using it is difficult to analyse what makes the recall rate for one kanji be higher or lower. Nevertheless your report is inspiring! I will try to go through the difficult kanji and see if I see common elements. Some of them I have not reached yet, so I will also try to think of alternative stories. Thanks! kanji recall rate statistics - ファブリス - 2006-11-06 cbogart Wrote:I wonder if kanji like "1" and "2" don't show up as easiest because they're the first ones in the book, and people that drive by the site to check it out may just be pressing buttons to experiment.The kanji for "one" ( 一 ) shows up in position 17 of the "best-remembered kanji" list. Believe it or not it was failed a whopping 45 times out of 3293 (based on data from last week)! kanji recall rate statistics - synewave - 2006-11-06 I'll hold my hand up! A few weeks back I wrote 壱 for 'one'... kanji recall rate statistics - cbogart - 2006-11-06 I've been keeping track lately of the changes in the number of kanji in my rightmost pile. I'm just past 1000 this week, and I have a little over 600 in that pile. I've been failing some old ones I thought I knew, so I got concerned and started checking that number before and after adding the day's 10 new kanji and running through my orange piles. So far, the last pile has kept increasing, so I guess I'm still making progress. But that would be an interesting graph to see -- the size of stack four over time. I guess the record-keeping for that would be pretty onerous, though... -chris kanji recall rate statistics - ergerg - 2006-11-07 The statistics are quite interesting, I've sort of been curious what fraction of cards most people get right. I've been keeping track of my statistics (number of cards reviewed, number missed, and number in each pile) from near the beginning of when I started on this website, mainly as a motivational tool, it was nice in the learning phase to see the graph of learned cards going up. I currently have 1879 cards in the top pile, my peak was 1907 (there's a definite 60 day oscillation in the number of cards in my top pile having to do with when a bunch of cards come up for review). Over the last two months (so for one cycle), I've missed 11.7% of the cards that came up for review (I don't know if this is high or low compared to other people, but I'm being quite picky in marking cards right; if I'm not sure of it, or get anything wrong with the stroke order I mark the card as wrong). One thing I think leads to some needlessly missed cards is that a fair number of the names Heisig chose have odd overlaps with others, or are "cute" in ways which make them hard to remember (Dr. vs doctor; present vs presents come to mind immediately). I think that if one were starting fresh you could make a fairly small number of changes that would eliminate a lot of needless mistakes (not taking anything away from Dr. Heisig, I'm a huge fan of his method of learning the kanji, and I think he had great insight in developing it). kanji recall rate statistics - inuki - 2006-11-07 If I don't get 85% or better I'm not happy with myself, if that helps (i'm picky about it too) . I add at least 10 cards a day, to meet my early december goal of finishing RTK1. I also use kanji gold offline most days. kanji recall rate statistics - Mighty_Matt - 2006-11-07 Interesting stats indeed, even if I don't fully understand them all!! I don't keep a detailed log of my passes/fails everyday. I normally just use any piece of paper lying around to do my reviews, so I don't even have a way of going back and checking. But when reviewing I use a blue and red pen. In blue I write down my answer and then check it. If it's right I move on, if not, I rewrite it correctly in red. This way I can easily see how I'm fareing at a glance. For each stack I normally work out a rough percentage of fails and if it's around 10% (or lower!) I'm happy. Take today for example. The chapter I studied yesterday has 17 kanji (the arrow one. Can't remember what number). I just finished testing myself on it and I got two out of 17 wrong. I'm more than happy with that, because while I got two wrong, I got 15 right .Ok, so I'm not sure this is anywhere on topic now, but thought I'd share... kanji recall rate statistics - dingomick - 2007-03-04 I'd love to see a Top ~25 Most Missed Kanji for Site/Individual feature. I know this is old, but thanks for the work laxxy! Very interesting questions arise. kanji recall rate statistics - ericshun - 2007-03-05 The model of recall for someone learning through Heisig should be something like R = a + b*(story strength) + c*(# of strokes) + d*(# of primitives) Why did you choose to base it on frequency in the newspaper or school grade? kanji recall rate statistics - laxxy - 2007-03-05 ericshun Wrote:The model of recall for someone learning through Heisig should be something likeIn addition to being interesting in themselves, those variables serve as controls -- not including them would lead to incorrect estimates here. If you are willing to go through the kanji list and enter the # of primitives for each, I'll gladly include that too and redo the analysis. "Story strength" is the main elusive variable here, and naturally there is no good way to measure it. In addition it would require a database that would tell us who used what story, and so far it does not exist. One of hte purposes of this exersise was to answer a related question: in all models estimated here "story strength" is a part of the residual, and it is reasonable to expect that kanji with high (/low) residuals (i.e. those whose true failure rate is much higher (/lower) than one predicted by the model) present relatively more (/less) difficulty to the story-makers, compared to similar ones. They are shown in the last two lists. kanji recall rate statistics - kame3 - 2009-12-31 *Major Bump* ![]() Perhaps it would be fun, after almost three years, to see the above statistics again and compare them. Is that possible Fuaburisu? Also, if possible, other statistics I would be interested in (I don't know if they can be obtained): How many of the people who get past 100 reviews (thereby excluding people who just try it for an instant) manage to finish RTK1? What is the average amount of reviews when people finish? What is the average amount of time to finish? What is the highest amount of reviews and what is the average review count for people who are finished? I think to know stuff like this, can quantify how well the RTK-method is working and also how much time you need to put in to finish. |