kanji koohii FORUM
kanji recall rate statistics - Printable Version

+- kanji koohii FORUM (http://forum.koohii.com)
+-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html)
+--- Forum: Remembering the Kanji (http://forum.koohii.com/forum-7.html)
+--- Thread: kanji recall rate statistics (/thread-249.html)



kanji recall rate statistics - laxxy - 2006-11-02

Combining the dataset on kanji review failure rates kindly provided by Fabrice with the kanji data in Edict, I constructed a dataset on the RTK1 kanji, consisting of:
Code:
obs:         2,042                        
vars:             9                        
size:        98,016 (99.1% of memory free)
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
framenum        int    %8.0g                  Heisig No.
sfc             int    %8.0g                  Failures
ssc             int    %8.0g                  Successes
str             int    %8.0g                  Reviews
tt              float  %9.0g                  Failure rate
frequency       int    %8.0g                  Frequency
grade           byte   %8.0g                  Grade
strokecount     byte   %8.0g                  Strokes
englishmeaning  str28  %28s                  
-------------------------------------------------------------------------------
Here are a few simple numbers.

First, some summary statistics:
Code:
. su sfc ssc str tt frequency grade strokecount

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         sfc |      2042     141.454    113.8418          3        848
         ssc |      2042    723.1572    597.9186        132       3293
         str |      2042    864.5916    672.2959        169       3338
          tt |      2042    .1833305    .0793473   .0037594   .3950617
   frequency |      2007    1028.071    616.4795          1       2495
-------------+--------------------------------------------------------
       grade |      2004    5.915669    2.408873          1          9
strokecount |      2042    10.30852    3.782139          1         23
And the correlation matrix:
Code:
. cor sfc ssc str tt frequency grade strokecount
(obs=1979)

             |      sfc      ssc      str       tt freque~y    grade stroke~t
-------------+---------------------------------------------------------------
         sfc |   1.0000
         ssc |   0.5955   1.0000
         str |   0.6991   0.9907   1.0000
          tt |   0.2847  -0.4171  -0.3231   1.0000
   frequency |   0.0813  -0.1552  -0.1244   0.3182   1.0000
       grade |   0.1483  -0.2756  -0.2202   0.5346   0.6821   1.0000
strokecount |   0.0726  -0.3328  -0.2839   0.5087   0.2044   0.3323   1.0000
10 best-remembered kanji, best to worst:
Code:
+-------------------------------------------------------------------------+
     | framenum   sfc    ssc    str   freque~y   grade   stroke~t   englishm~g |
     |-------------------------------------------------------------------------|
  1. |      768     3    795    798        131       1          3     mountain |
  2. |      286     9   1516   1525        333       1          7          car |
  3. |      595     7    920    927        157       2          4        heart |
  4. |      195    14   1751   1765        317       1          4         tree |
  5. |      951     6    618    624          5       1          2       person |
     |-------------------------------------------------------------------------|
  6. |     1616     3    301    304        452       2          8        gates |
  7. |      107    25   2048   2073          7       1          3        large |
  8. |        2    40   3265   3305          9       1          2          two |
  9. |        3    41   3235   3276         14       1          3        three |
10. |       14    39   3015   3054         90       1          5   rice field |
     +-------------------------------------------------------------------------+
And 20 worst-remembered, worst to best:
Code:
+--------------------------------------------------------------------------+
     | framenum   sfc    ssc    str   freque~y   grade   stroke~t   englishme~g |
     |--------------------------------------------------------------------------|
  1. |     1908    96    147    243       1097       8         11   explanation |
  2. |     2000    89    137    226       1082       8         16     recommend |
  3. |     1563   143    221    364       1723       8         16           sew |
  4. |     1789   116    182    298       1536       8         18        appear |
  5. |     1733   115    182    297       1549       8         11        solemn |
     |--------------------------------------------------------------------------|
  6. |     1914    89    144    233          .       8         10      decrease |
  7. |     1577   135    220    355        830       6         12     diligence |
  8. |      766   380    622   1002       1278       8         10         tempt |
  9. |     1954    83    138    221       1903       8         15       entrust |
10. |      631   437    731   1168       1682       8         16       remorse |
     |--------------------------------------------------------------------------|
11. |     1939    78    135    213       1064       8         13    reputation |
12. |     1394   160    282    442        889       8         20       suspend |
13. |     1562   139    247    386       1836       8         10        summit |
14. |     1803    98    176    274       1737       8         10       respect |
15. |     1372   157    283    440       1291       8         15      affinity |
     |--------------------------------------------------------------------------|
16. |      387   587   1060   1647        897       8         12       surpass |
17. |     1570   126    228    354       1281       8         10      peaceful |
18. |      998   251    455    706       2073       8         15        praise |
19. |     1969    76    138    214        624       6         12    concerning |
20. |     1841    97    178    275        905       8          8     residence |
     +--------------------------------------------------------------------------+
 
 
 
 
There are quite a few kanji with a high Heisig number here, especially in the top 10. It could be due of Dr. Heisig leaving the harder kanji for later in the text, or perhaps people just haven't learned them well enough yet, and there can be many other explanations:

Code:
.395062 +
         |                                                             *
         |                                               *                 *
         |
    F    |                                                        *
    a    |                                                      *
    i    |
    l    |                                                             *
    u    |                                                *
    r    |                *
    e    |                                                               *
         |          *
    r    |
    a    |
    t    |                                                              *
    e    |
         |                                        *
         |                                               *
         |                                                         *
         | *                       *              *      *               *
.352727 +                                                          *
          +----------------------------------------------------------------+
              387                   Heisig No.                       2000
One also wonders, first, how the kanji difficulty is affected by the predictable factors, and second, what are the most difficult kanji after correcting for those factors -- i.e. what kanji seem more difficult than they should be.
It is possible that those would be the kanji that could benefit from better stories the most.


First, let's fit a simple linear model:

Code:
RECALL = b0+b1*FRAMENUM+b2*GRADE+b3*STROKECOUNT+b4*FREQUENCY+eps

. regress tt framenum grade stroke frequency

      Source |       SS       df       MS              Number of obs =    1979
-------------+------------------------------           F(  4,  1974) =  386.69
       Model |  5.45293192     4  1.36323298           Prob > F      =  0.0000
    Residual |  6.95910812  1974  .003525384           R-squared     =  0.4393
-------------+------------------------------           Adj R-squared =  0.4382
       Total |    12.41204  1978  .006275046           Root MSE      =  .05937

------------------------------------------------------------------------------
          tt |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    framenum |   .0000235   2.38e-06     9.86   0.000     .0000188    .0000281
       grade |   .0142809    .000791    18.05   0.000     .0127297    .0158321
strokecount |   .0069658   .0003857    18.06   0.000     .0062094    .0077221
   frequency |  -8.71e-06   2.98e-06    -2.92   0.004    -.0000146   -2.86e-06
       _cons |   .0123982   .0045974     2.70   0.007      .003382    .0214144
------------------------------------------------------------------------------
As we can see all the coefficients (as expected) are highly significant and have expected sign -- that is, the kanji that appear later in the book, are studied in higher Japanese school grade, are more complex, and appear less frequently in newspapers, are harder to recall.
This model misses a few kanji, due to some them either not being assigned either a school grade, or not appearing in the newspaper frequency dataset. I introduce two dummies (GRADEMISS and FREQMISS, equal to 1 for the missing observations and 0 elsewhere) to account for those. In addition, I add two more variables -- dummies for the kanji in Part 1 and Pt. 2 of the Heisig book (the first part has detailed stories by Dr.Heisig, and the 2nd has less detailed ones. iirc, first part is also available online):

Code:
RECALL = b0+b1*FRAMENUM+b2*GRADE+b3*STROKECOUNT+b4*FREQUENCY+

. regress tt framenum grade0 grademiss stroke freq0 freqmiss pt1 pt2

      Source |       SS       df       MS              Number of obs =    2042
-------------+------------------------------           F(  8,  2033) =  198.27
       Model |  5.63176092     8  .703970115           Prob > F      =  0.0000
    Residual |  7.21836221  2033  .003550596           R-squared     =  0.4383
-------------+------------------------------           Adj R-squared =  0.4361
       Total |  12.8501231  2041  .006295994           Root MSE      =  .05959

------------------------------------------------------------------------------
          tt |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    framenum |   .0000201   3.48e-06     5.78   0.000     .0000133    .0000269
      grade0 |   .0141365   .0007917    17.86   0.000     .0125838    .0156893
   grademiss |   .0782822   .0118053     6.63   0.000     .0551303     .101434
strokecount |   .0070189   .0003824    18.36   0.000      .006269    .0077688
       freq0 |  -8.70e-06   2.97e-06    -2.93   0.003    -.0000145   -2.88e-06
    freqmiss |  -.0236515   .0114678    -2.06   0.039    -.0461413   -.0011617
         pt1 |  -.0098122   .0055459    -1.77   0.077    -.0206884    .0010641
         pt2 |  -.0009667    .005195    -0.19   0.852    -.0111547    .0092213
       _cons |   .0175479   .0060571     2.90   0.004     .0056692    .0294267
------------------------------------------------------------------------------
Interestingly, the kanji NOT in the newspaper frequency dataset appear easier to recall than others. Perhaps this is because for a rare kanji to be included into 常用 set it had to have particularly simple structure or be otherwise easy. Also kanji from Pt.1 of RTK appear easier than others, but there is no effect for those in Pt.2.

Dropping Pt.2 from the model, here are the estimation results:
Code:
. regress tt framenum grade0 grademiss stroke freq0 freqmiss pt1

      Source |       SS       df       MS              Number of obs =    2042
-------------+------------------------------           F(  7,  2034) =  226.69
       Model |  5.63163798     7  .804519711           Prob > F      =  0.0000
    Residual |  7.21848515  2034  .003548911           R-squared     =  0.4383
-------------+------------------------------           Adj R-squared =  0.4363
       Total |  12.8501231  2041  .006295994           Root MSE      =  .05957

------------------------------------------------------------------------------
          tt |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    framenum |   .0000205   2.82e-06     7.27   0.000      .000015     .000026
      grade0 |   .0141403   .0007913    17.87   0.000     .0125884    .0156921
   grademiss |   .0783013   .0118021     6.63   0.000     .0551559    .1014468
strokecount |   .0070126   .0003808    18.42   0.000     .0062658    .0077594
       freq0 |  -8.71e-06   2.97e-06    -2.94   0.003    -.0000145   -2.89e-06
    freqmiss |  -.0237389   .0114554    -2.07   0.038    -.0462045   -.0012732
         pt1 |  -.0093114    .004848    -1.92   0.055     -.018819    .0001962
       _cons |   .0170344   .0053907     3.16   0.002     .0064626    .0276063
------------------------------------------------------------------------------
And the list of 20 kanji that appear most difficult after correcting for the above factors:
Code:
+-------------------------------------------------------------------+
     | framenum   sfc    ssc   stroke~t   grade   freque~y   englishme~g |
     |-------------------------------------------------------------------|
  1. |      766   380    622         10       8       1278         tempt |
  2. |     1577   135    220         12       6        830     diligence |
  3. |     1914    89    144         10       8          .      decrease |
  4. |     1510   137    279          5       6        337        income |
  5. |     1445   131    250         12       4        769       rejoice |
     |-------------------------------------------------------------------|
  6. |     1489   119    267          7       4        896          hope |
  7. |      863   248    633          5       4        857   achievement |
  8. |     1908    96    147         11       8       1097   explanation |
  9. |     1733   115    182         11       8       1549        solemn |
10. |      936   220    494         11       4        799     salvation |
     |-------------------------------------------------------------------|
11. |     1562   139    247         10       8       1836        summit |
12. |      387   587   1060         12       8        897       surpass |
13. |      553   382    844         13       .          .       envious |
14. |     1241   157    342          5       8       1537        adroit |
15. |     1169   183    399         12       4        515          full |
     |-------------------------------------------------------------------|
16. |     1841    97    178          8       8        905     residence |
17. |     1803    98    176         10       8       1737       respect |
18. |      177   608   1538         12       4        469      quantity |
19. |     1570   126    228         10       8       1281      peaceful |
20. |     1969    76    138         12       6        624    concerning |
     +-------------------------------------------------------------------+
 
 
 
 

As we can see these are somewhat more evenly distributed over the book length:

Code:
.395062 +
         |                                                              *
         |
         |                                                        *
    F    |                      *                            *           *
    a    |
    i    |
    l    |                                                  *
    u    |        *                                         *        *     *
    r    |                                                            *
    e    |                                              *
         |
    r    |                                                *
    a    |
    t    |
    e    |              *                     *  *
         |                            *                  *
         |
         |
         |
.281498 + *                       *
          +----------------------------------------------------------------+
              177                   Heisig No.                       1969
Now the interesting question is, of course, what can we do to make those kanji easier to recall.
Probably, the best thing would be if we could think of what stories we ourselves used there, whether they helped, and if they did, whether they could be shared with the community.

For comparison, here is also a list of 10 easiest kanji after controlling for the factors in the last model:
Code:
+---------------------------------------------------------------------+
     | framenum   sfc    ssc   stroke~t   grade   freque~y   englishmean~g |
     |---------------------------------------------------------------------|
  1. |      593    28    853         11       8       1142            hemp |
  2. |      315   148   1334         19       8       1486           whale |
  3. |      551   105    925         17       8        355           fresh |
  4. |     1766    21    206         13       8       1609              Go |
  5. |     1869    16    179         11       8       1753           liner |
     |---------------------------------------------------------------------|
  6. |     1623    18    293         11       6        951          closed |
  7. |     1806    12    243          4       8        339            well |
  8. |      518    55    943         11       8       2031   lightning-bug |
  9. |     2003    25    168         14       9       1105            bear |
10. |     1944    10    209         10       .       2042            crow |
     +---------------------------------------------------------------------+


 
Pepe might also be interested in checking if he can find any systematic difference between the stories that are available for these 'best' and 'worst' kanji.


kanji recall rate statistics - leosmith - 2006-11-02

Cool post laxxy! A couple of comments. I found it interesting that "one" didn't make the easy lists. As for the difficult lists, I find the 20 in your first list harder to remember than the ones in your second list. I finished book one almost a year ago, and have had all the flashcards in supermemo for about 8 months now. I'm not judging you adjustments, but I thought you might want to know. I found it very frustrating that I couldn't get my 90% on either of the hard lists, so I believe these really are some of the hardest. Thanks a lot - this stuff is really interesting to me.


kanji recall rate statistics - laxxy - 2006-11-02

leosmith Wrote:Cool post laxxy! A couple of comments. I found it interesting that "one" didn't make the easy lists. As for the difficult lists, I find the 20 in your first list harder to remember than the ones in your second list. I finished book one almost a year ago, and have had all the flashcards in supermemo for about 8 months now. I'm not judging you adjustments, but I thought you might want to know. I found it very frustrating that I couldn't get my 90% on either of the hard lists, so I believe these really are some of the hardest. Thanks a lot - this stuff is really interesting to me.
That's a very good point, I am actually not sure which list could benefit more. I wanted to make a small poll on a non-overlapping section of the two lists at first. For me, too, the first list seemed harder, but I thought it could be just because I was at the end of the book myself. Since it's been a while for you, that was probably not the reason.

Which one could be more affected by new stories, I am not sure -- it would be interesting to see if there are any differences in stories for kanji on top and in the bottom of the 2nd list (the 1st one would not be good for such a comparison). I also wanted to check how different things affect the recall rate.


kanji recall rate statistics - laxxy - 2006-11-03

One other surprising character for me was {315}, whale, in the last list -- I was having a lot of problems with this character, and couldn't recall it correctly even now, the problem being the ordering of primitives, as 'capital' usually appears on the left.


kanji recall rate statistics - leosmith - 2006-11-03

laxxy Wrote:I wonder, which of the two 20-most-hardest kanji lists feels harder for you?
Hey, I just thought of something....I'm pretty sure supermemo will tell me my best & worst kanji. I'll try to check this out tonight. Can any other supermemo users shed some light on this, and perhaps share their best & worst? Also, I wonder if twinkle & mnemosyn (sp?) keep track of such things?


kanji recall rate statistics - Tatiana - 2006-11-03

Maybe explanation is so hard to remember because it's a simplified form?

Originally, it was written as 釋, and the etimology of this character, according to Zhongwen site, is "Differentiate from criminals uinder watch." It's interesting that a lot of people's stories have something to do with criminal inestigation despite the shakuhachi and the animal tracks primitive meanings.

I would be interested in knowing how many of the hard-to-remember kanji are simplified.


kanji recall rate statistics - ファブリス - 2006-11-03

Nice work, laxxy. Thank you for posting it.

Regarding "whale" the ordering of primitives/radicals is based on the "principal meaning", the component indicating the principal meaning gets the prominent position.

For example you get lot of state of mind kanji with the heart as main component (the "state of mind" primitive is actually the heart compressed to the left, if I am not mistaken).

If I recall well, most kanji have secondary primitives as phonetic components anyway, however meaningful they can seem at times.

With "whale" the only component there that originally had any meaning value is the "fish". And so you get the "fish" primitive prominently on the left for all kinds of fish and aquatic mammals like carp, shark, fin, sardine, killer whale... 鯉 鱶 鰭 鰯 鯱. So in a way it works just like all the tree kanji, the flower kanji, the food kanji etc.

I think the "easiest" kanji list shows kanji that have a high imagery value (see the Mnemonic Effects of Imagery). In the "most difficult" list we see the opposite : keywords with very low imagery value. Even "summit" is somewhat abstract since it can be used in many contexts tangible or intangible.

Interestingly "explanation" and "recommend" are two characters which I have no difficulty to recall. I updated my story for "recommend" in the Study area is I realised I had left out the prominent character in the form of Uncle Jimbo from South Park series. He is a deer hunter, and obviously that helped me remember the keyword easily. Similarly for "explanation" I use the character "Lieutenant Columbo" who always finds an explanation at the end of the episodes of the TV series.

This got me thinking that to display a list of the top difficult kanji (perhaps up to the top 100) would help concentrate community work on those stories, and it could be really interesting over time to see how with our common effort that list would update and how the kanji would move position.

To push this idea further I realised that I could also display a list of the most failed kanji of the last day, based on the last day's reviews accross all members! I did a simple database querry like this this week and found that I will need to update the database, because currently the hit/miss counts are counted from the very first review, regardless of the date. This shouldn't be too difficult. I think this could help even more concentrate shared work on the stories that are most needed.


kanji recall rate statistics - laxxy - 2006-11-03

ファブリス Wrote:Nice work, laxxy. Thank you for posting it.

Regarding "whale" the ordering of primitives/radicals is based on the "principal meaning", the component indicating the principal meaning gets the prominent position.

For example you get lot of state of mind kanji with the heart as main component (the "state of mind" primitive is actually the heart compressed to the left, if I am not mistaken).

If I recall well, most kanji have secondary primitives as phonetic components anyway, however meaningful they can seem at times.

With "whale" the only component there that originally had any meaning value is the "fish". And so you get the "fish" primitive prominently on the left for all kinds of fish and aquatic mammals like carp, shark, fin, sardine, killer whale... 鯉 鱶 鰭 鰯 鯱. So in a way it works just like all the tree kanji, the flower kanji, the food kanji etc.
That's a very good point, Dr. Heisig should have noted this in the book. He only has 'carp' in RTK1 (which I incidentally also had trouble with), and the 'fish' primitive is not quite as strong in other characters to pick the pattern.

Quote:I think the "easiest" kanji list shows kanji that have a high imagery value (see the Mnemonic Effects of Imagery). In the "most difficult" list we see the opposite : keywords with very low imagery value. Even "summit" is somewhat abstract since it can be used in many contexts tangible or intangible.
I agree, that has generally been the case for me too.
One notable thing is that I do not see kanji from Lesson 52 (which was mostly irregular shapes) in the most-hard-to-remember lists, and I haven't had much trouble with them either. I'll test this. Perhaps visual memory is a fine tool when used in a limited way, and one should feel more free using it for harder kanji.

Quote:This got me thinking that to display a list of the top difficult kanji (perhaps up to the top 100) would help concentrate community work on those stories, and it could be really interesting over time to see how with our common effort that list would update and how the kanji would move position.
Yes, that would be interesting.

Quote:To push this idea further I realised that I could also display a list of the most failed kanji of the last day, based on the last day's reviews accross all members! I did a simple database querry like this this week and found that I will need to update the database, because currently the hit/miss counts are counted from the very first review, regardless of the date. This shouldn't be too difficult.
Yes, I think that would be interesting to see and it could attract more story writers where their attention it is needed.
If you also record story adding/revision times, one could even cross-reference this with the kanji failure database and do an analysis of the type Pepe is interested in.


kanji recall rate statistics - CharleyGarrett - 2006-11-03

I was sceptical about this topic, and I don't really get the math, but the list of hardest kanji pretty well matched the ones that I found hard, and the ones that were easy, I also found easy. It could indeed be fun and valuable to have a list of the hardest kanji, so we could collaborate on providing more effort on creating good stories. I know there were many times when I was having trouble, and then somebody came up with an unusually creative story that just really made it easy. Sometimes it just takes some creativity to break the wall down.

Thanks, laxxy!


kanji recall rate statistics - laxxy - 2006-11-03

CharleyGarrett Wrote:I was sceptical about this topic, and I don't really get the math, but the list of hardest kanji pretty well matched the ones that I found hard, and the ones that were easy, I also found easy. It could indeed be fun and valuable to have a list of the hardest kanji, so we could collaborate on providing more effort on creating good stories. I know there were many times when I was having trouble, and then somebody came up with an unusually creative story that just really made it easy. Sometimes it just takes some creativity to break the wall down.

Thanks, laxxy!
You are welcome.
I wonder, which of the two 20-most-hardest kanji lists feels harder for you? Where did you have more trouble inventing good stories, on average?
Dropping repeated kanji, it would be

Recommend, Sew, Appear, Entrust, Remorse, Reputation, Suspend, Affinity, Praise, vs.

Income, Rejoice, Hope, Achievement, Salvation, Envious, Adroit, Full, Quantity?


kanji recall rate statistics - cbogart - 2006-11-06

I wonder if kanji like "1" and "2" don't show up as easiest because they're the first ones in the book, and people that drive by the site to check it out may just be pressing buttons to experiment.


kanji recall rate statistics - PepeSeco - 2006-11-06

laxxy Wrote:Pepe might also be interested in checking if he can find any systematic difference between the stories that are available for these 'best' and 'worst' kanji.
Great work laxxy! It is really interesting. I of course see the same problem now that I mentioned in the other thread, namely that without knowing what stories people are using it is difficult to analyse what makes the recall rate for one kanji be higher or lower. Nevertheless your report is inspiring! I will try to go through the difficult kanji and see if I see common elements. Some of them I have not reached yet, so I will also try to think of alternative stories. Thanks!


kanji recall rate statistics - ファブリス - 2006-11-06

cbogart Wrote:I wonder if kanji like "1" and "2" don't show up as easiest because they're the first ones in the book, and people that drive by the site to check it out may just be pressing buttons to experiment.
The kanji for "one" ( 一 ) shows up in position 17 of the "best-remembered kanji" list. Believe it or not it was failed a whopping 45 times out of 3293 (based on data from last week)!


kanji recall rate statistics - synewave - 2006-11-06

I'll hold my hand up! A few weeks back I wrote 壱 for 'one'...


kanji recall rate statistics - cbogart - 2006-11-06

I've been keeping track lately of the changes in the number of kanji in my rightmost pile. I'm just past 1000 this week, and I have a little over 600 in that pile. I've been failing some old ones I thought I knew, so I got concerned and started checking that number before and after adding the day's 10 new kanji and running through my orange piles.

So far, the last pile has kept increasing, so I guess I'm still making progress. But that would be an interesting graph to see -- the size of stack four over time. I guess the record-keeping for that would be pretty onerous, though...

-chris


kanji recall rate statistics - ergerg - 2006-11-07

The statistics are quite interesting, I've sort of been curious what fraction of cards most people get right.

I've been keeping track of my statistics (number of cards reviewed, number missed, and number in each pile) from near the beginning of when I started on this website, mainly as a motivational tool, it was nice in the learning phase to see the graph of learned cards going up.

I currently have 1879 cards in the top pile, my peak was 1907 (there's a definite 60 day oscillation in the number of cards in my top pile having to do with when a bunch of cards come up for review). Over the last two months (so for one cycle), I've missed 11.7% of the cards that came up for review (I don't know if this is high or low compared to other people, but I'm being quite picky in marking cards right; if I'm not sure of it, or get anything wrong with the stroke order I mark the card as wrong).

One thing I think leads to some needlessly missed cards is that a fair number of the names Heisig chose have odd overlaps with others, or are "cute" in ways which make them hard to remember (Dr. vs doctor; present vs presents come to mind immediately). I think that if one were starting fresh you could make a fairly small number of changes that would eliminate a lot of needless mistakes (not taking anything away from Dr. Heisig, I'm a huge fan of his method of learning the kanji, and I think he had great insight in developing it).


kanji recall rate statistics - inuki - 2006-11-07

If I don't get 85% or better I'm not happy with myself, if that helps (i'm picky about it too) . I add at least 10 cards a day, to meet my early december goal of finishing RTK1. I also use kanji gold offline most days.


kanji recall rate statistics - Mighty_Matt - 2006-11-07

Interesting stats indeed, even if I don't fully understand them all!!

I don't keep a detailed log of my passes/fails everyday. I normally just use any piece of paper lying around to do my reviews, so I don't even have a way of going back and checking. But when reviewing I use a blue and red pen. In blue I write down my answer and then check it. If it's right I move on, if not, I rewrite it correctly in red. This way I can easily see how I'm fareing at a glance. For each stack I normally work out a rough percentage of fails and if it's around 10% (or lower!) I'm happy. Take today for example. The chapter I studied yesterday has 17 kanji (the arrow one. Can't remember what number). I just finished testing myself on it and I got two out of 17 wrong. I'm more than happy with that, because while I got two wrong, I got 15 right Wink.

Ok, so I'm not sure this is anywhere on topic now, but thought I'd share...


kanji recall rate statistics - dingomick - 2007-03-04

I'd love to see a Top ~25 Most Missed Kanji for Site/Individual feature. I know this is old, but thanks for the work laxxy! Very interesting questions arise.


kanji recall rate statistics - ericshun - 2007-03-05

The model of recall for someone learning through Heisig should be something like

R = a + b*(story strength) + c*(# of strokes) + d*(# of primitives)

Why did you choose to base it on frequency in the newspaper or school grade?


kanji recall rate statistics - laxxy - 2007-03-05

ericshun Wrote:The model of recall for someone learning through Heisig should be something like

R = a + b*(story strength) + c*(# of strokes) + d*(# of primitives)

Why did you choose to base it on frequency in the newspaper or school grade?
In addition to being interesting in themselves, those variables serve as controls -- not including them would lead to incorrect estimates here. If you are willing to go through the kanji list and enter the # of primitives for each, I'll gladly include that too and redo the analysis.

"Story strength" is the main elusive variable here, and naturally there is no good way to measure it. In addition it would require a database that would tell us who used what story, and so far it does not exist. One of hte purposes of this exersise was to answer a related question: in all models estimated here "story strength" is a part of the residual, and it is reasonable to expect that kanji with high (/low) residuals (i.e. those whose true failure rate is much higher (/lower) than one predicted by the model) present relatively more (/less) difficulty to the story-makers, compared to similar ones. They are shown in the last two lists.


kanji recall rate statistics - kame3 - 2009-12-31

*Major Bump* Tongue

Perhaps it would be fun, after almost three years, to see the above statistics again and compare them. Is that possible Fuaburisu?
Also, if possible, other statistics I would be interested in (I don't know if they can be obtained):
How many of the people who get past 100 reviews (thereby excluding people who just try it for an instant) manage to finish RTK1?
What is the average amount of reviews when people finish?
What is the average amount of time to finish?
What is the highest amount of reviews and what is the average review count for people who are finished?

I think to know stuff like this, can quantify how well the RTK-method is working and also how much time you need to put in to finish.