Back

Kanji lists, Joyo, non-joyo etc. (Sticky topic)

#51
(2017-03-16, 11:47 am)ItaiB Wrote: The English Wikipedia article on Kanji kentei. The counterpart Japanese Wikipedia article has different numbers (namely 2994 for level pre-1 and "approx. 6000" for level 1), but they still don't coincides with your list.

The information on Wikipedia (en) is misleading and incorrect.

The numbers 2965 and 6355 refer to the number of kanji code points in the first level and the 1st+2nd levels of the JIS X 0208 standard. Some 1級 kanji are not encoded in any of the JIS standards, and in the past there was never a 1:1 correspondence with the JIS X 0208 kanji and the 漢検 準1級/1級 lists. Some of the JIS X 0208 kanji literally have no historically attested readings.

The numbers on Wikipedia (ja) sound about right. When you add the "head" character kanji and the 旧字体 variants, you get around 3000/6000 distinct glyphs for 準1級/1級.
Reply
#52
fkb9g, thank you for this! I downloaded and will definitely play around with it even if it isn't complete.

Just curious, why did you intentionally leave out the 旧字体 characters? I would have thought they would be among the more useful rare kanji, unless you have zero interest in ever reading anything written before 1945.

Also (curious again), what official publications did you use? I have a bunch of Kentei study materials, but they were published before the Kentei got an overhaul along with the jouyou, and I haven't gotten around to purchasing updated materials. (The overhaul didn't affect the earlier school levels, but there were a LOT of changes to Kentei 2kyuu and up.)
Edited: 2017-03-16, 12:30 pm
Reply
#53
(2017-03-16, 12:27 pm)tanaquil Wrote: fkb9g, thank you for this! I downloaded and will definitely play around with it even if it isn't complete.

Just curious, why did you intentionally leave out the 旧字体 characters? I would have thought they would be among the more useful rare kanji, unless you have zero interest in ever reading anything written before 1945.

Also (curious again), what official publications did you use? I have a bunch of Kentei study materials, but they were published before the Kentei got an overhaul along with the jouyou, and I haven't gotten around to purchasing updated materials. (The overhaul didn't affect the earlier school levels, but there were a LOT of changes to Kentei 2kyuu and up.)

I referenced these sources: I'm thinking about getting these books which have reference glyphs in 明朝 script (the 漢字辞典 uses 教科書 script).

In my mind, 新字体 and 旧字体 forms do not represent separate "standalone" characters, but are different facets of the same 漢字 (regardless of how they may be encoded in computer character sets). In a character entry in a Japanese 漢字辞典, you would see the 新字体 form at the head with variants (旧字体 & 異体字) at the side. Similarly, I consider the current 新字体 forms to be the "canonical" representation for Japan kanji.

Sometimes, it gets a bit complicated. For example, see this Kanjipedia entry. Under the Unicode CJK ideograph unification rules, the head character is unambiguously equivalent to 槔, but note that the png graphic does not match how computer fonts currently render 槔. Japanese web pages often use the 槹 (a 漢検 異体字 form which reflects its traditional Chinese origin) to represent this kanji. In my list, I use 槔.
Edited: 2017-03-16, 9:11 pm
Reply
See this thread for Holiday Countdown Deals (until Dec 15th)
JapanesePod101
#54
I have imported your file to Anki, opened the browser and searched for all the cards whose 'IndexRTK' field is non-empty. The number of results was 2946. Aren't there supposed to be 3000?
Reply
#55
(2017-03-16, 2:33 pm)ItaiB Wrote: I have imported your file to Anki, opened the browser and searched for all the cards whose 'IndexRTK' field is non-empty. The number of results was 2946. Aren't there supposed to be 3000?

I checked the spreadsheet I downloaded, sorted by RTK number. At a cursory glance, it looks like maybe the characters that were in Heisig but are kyuujitai were omitted (every character that I hit on that wasn't represented had "old" by it next to its entry on koohii). That probably accounts for the difference.
Reply
#56
(2017-03-16, 2:33 pm)ItaiB Wrote: I have imported your file to Anki, opened the browser and searched for all the cards whose 'IndexRTK' field is non-empty. The number of results was 2946. Aren't there supposed to be 3000?

The 3000 writing frames in RTK include some 旧字体 glyphs. (In my opinion, they do not warrant distinct frames.)

I do not include 旧字体 variants in my 漢検 list because it is not possible to make a comprehensive list at this time. The 漢検 variant glyphs either:
  1. have their own distinct Unicode code point, or
  2. have no distinct code point and must be differentiated by a variation sequence, or
  3. are not in the Unicode ideographic variation database and must be distinguished some other way (for example by Moji Joho database glyph ID), or
  4. only exist in the 漢検 reference materials and are not in any other database/font (that I can find).
The head characters can be distinctly represented with unique Unicode code points (category 1 above) but the variants cannot.

I would be happy to provide a plain text list of all the 漢検 variants, but first we would need to get the 漢検-specific glyphs registered as an IVD collection. Are there are kanji experts here (like katsuowho would like to help me develop such a proposal?
Reply
#57
(2017-03-16, 1:27 pm)fkb9g Wrote: In my mind, 新字体 and 旧字体 forms do not represent separate "standalone" characters, but are different facets of the same 漢字 (regardless of how they may be encoded in computer character sets). In a character entry in a Japanese 漢字辞典, you would see the 新字体 form at the head with variants (旧字体 & 異体字) at the side. Similarly, I consider the current 新字体 forms to be the "canonical" representation for Japan kanji.

Thanks very much for the links.

I see your point, of course. On the other hand, if you don't happen to *know* that a given old-form character is a variant on a new standard-form character, you might as well not know it at all, which means that it is as much an object to be learned as any.

The whole history of what is and isn't in the JIS standard is fascinating. It was only recently that I was reading about how a few characters in the standard are known as "ghost" characters, mostly because they seem to have been entered into the record purely as a result of user error.

Is there a record somewhere of how many people have actually passed Kentei 1kyuu?

(2017-03-16, 3:09 pm)fkb9g Wrote:
(2017-03-16, 2:33 pm)ItaiB Wrote: I have imported your file to Anki, opened the browser and searched for all the cards whose 'IndexRTK' field is non-empty. The number of results was 2946. Aren't there supposed to be 3000?

The 3000 writing frames in RTK include some 旧字体 glyphs. (In my opinion, they do not warrant distinct frames.)

I do not include 旧字体 variants in my 漢検 list because it is not possible to make a comprehensive list at this time. The 漢検 variant glyphs either:
  1. have their own distinct Unicode code point, or
  2. have no distinct code point and must be differentiated by a variation sequence, or
  3. are not in the Unicode ideographic variation database and must be distinguished some other way (for example by Moji Joho database glyph ID), or
  4. only exist in the 漢検 reference materials and are not in any other database/font (that I can find).
The head characters can be distinctly represented with unique Unicode code points (category 1 above) but the variants cannot.

I would be happy to provide a plain text list of all the 漢検 variants, but first we would need to get the 漢検-specific glyphs registered as an IVD collection. Are there are kanji experts here (like katsuowho would like to help me develop such a proposal?

This is a great point! (The part about how some non-distinct code points cannot easily be represented in computer files. I have run into this issue myself in my initial attempts to isolate old-form kanji.)

I have no idea how to solve it, but would love to see input from the experts.
Edited: 2017-03-16, 3:13 pm
Reply
#58
I see. Well, thanks for the replies and for the file.
Reply
#59
(2017-03-16, 3:10 pm)tanaquil Wrote: I see your point, of course. On the other hand, if you don't happen to *know* that a given old-form character is a variant on a new standard-form character, you might as well not know it at all, which means that it is as much an object to be learned as any.

I agree. In my personal kanji deck, I include the 旧字体 forms for the jōyō kanji and include them as secondary information on the back of the card (I don't test myself on them). Some of the 旧字体 forms require Japanese fonts with expanded glyph ranges (like IPAex Mincho http://ipafont.ipa.go.jp/ or Hanazono http://fonts.jp/hanazono/) and can be accessed using using OpenType features. Anki doesn't officially support OpenType, but the functionality works in WebKit so I can view this subset of 旧字体 in my Anki cards.

(2017-03-16, 3:10 pm)tanaquil Wrote: Is there a record somewhere of how many people have actually passed Kentei 1kyuu?

The pass rate is low, but there are some Westerners who have passed.

(2017-03-16, 3:10 pm)tanaquil Wrote: This is a great point! (The part about how some non-distinct code points cannot easily be represented in computer files. I have run into this issue myself in my initial attempts to isolate old-form kanji.)

I have no idea how to solve it, but would love to see input from the experts.

Any kanji glyph variant can potentially be represented in plain text, provided that ① it has been registered in the Unicode IVD, ② you have a font to support it, and ③ your computer operating system and applications support IVD.
Reply
#60
By the way what do the initials 'KO' and 'JiShop' mean in the field names 'IndexKO', 'BookKO', 'IndexJiShop' and 'KeywordJiShop'?
Edited: 2017-03-16, 10:48 pm
Reply