![]() |
|
How many RTKv1 kanji are used as primitives in subsequent kanji? - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: Remembering the Kanji (http://forum.koohii.com/forum-7.html) +--- Thread: How many RTKv1 kanji are used as primitives in subsequent kanji? (/thread-12052.html) |
How many RTKv1 kanji are used as primitives in subsequent kanji? - aldebrn - 2014-08-06 I know there are the 300-odd primitives that aren't real kanji, listed in Appendix II of RTK volume 1, 6th edition. But does anyone have a good guess as to how many bona fide kanji are used as primitive building blocks for subsequent kanji? Or equivalently, how many kanji are not used as building blocks (terminal leaf nodes in the Heisig dependency graph)? I can lower-bound this number to 514, i.e., there's at least 514 of the 2200 RTK1 kanji that are used in later kanji. I calculated this by looking at the "Primitive look-up data." column of Nukemarine's RTK Spreadsheet RTK 1 and 3 and seeing which keywords showed up in later ones' primitive list. But this is incomplete because Nukemarine followed Heisig in using different keywords for the same kanji when used as a regular kanji (e.g., "oneself" for 自) versus a primitive ("nose"), so kanji which are used as primitives, like 自, are not counted by this calculation. If anyone's done the extra legwork to figure out just how many kanji are used as primitives, and can share that, many thanks for your hard work and generosity! How many RTKv1 kanji are used as primitives in subsequent kanji? - john555 - 2014-08-06 Just a question: what is the purpose of this information? I'm just curious, because I worked through RTK1 (finished April 2014) and didn't need the info. you're looking for. How many RTKv1 kanji are used as primitives in subsequent kanji? - aldebrn - 2014-08-06 john555 Wrote:what is the purpose of this information?I didn't want to explain this in the original post hoping nobody would ask. There are a couple of reasons. In the immediate term, because I'm trying to use a modified person-action-object structure on my stories (PAO is a memory sport thing, see here). I can simplify the story/system for kanji that aren't going to be used as primitives later. These stories will be just as memorable, but will lack an extra hook that would enable it to participate in later kanji, thus saving time and effort. Even at >514 kanji-not-used-as-primitives ('leaf node kanji'), it might make sense to sit down and identify all leaf nodes to maximize the chances of this method's success. (Apparently both Japanese and memory are hobbies of mine. I recognize most people would just buckle down and focus on their primary goal of Japanese.) --- In the longer term, I'd like to construct the Heisig "dependency graph", with kanji/keywords being nodes and directed edges indicating which are used as primitives for which others, and this number would help get me started towards that. Such a graph would make a neat visualization ("mappr0n" is what kids are calling it nowadays?), but it also enables potentially-interesting things like reordering the kanji from what Heisig presents, i.e., modifying the path that Heisig uses to traverse all 2200 nodes. This is because if you specify the ordering of the primitives (kanji and non-kanji) and apply a couple of simple hierarchy-preserving rules, i.e., (1) don't introduce a kanji whose primitives you haven't already introduced, and (2) before introducing a new primitive, introduce all kanji that can be built using previous primitives, you get something like RTK. Heisig applied these rules quite loosely in his RTK1, since the first two lessons are mostly a bunch of primitives to "prime the pump", and sometimes he delays introducing a kanji, i.e., 蜜 could have been introduced much earlier but he wanted to pair it with 密. But the magic of RTK is that for the most part, these rules are followed and you learn kanji by building on ones you already know. Therefore, you could pick your own ordering of primitives (kanji and Heisig's non-kanji) and apply the rule automatically and get a different sequence of the 2200 jouyou kanji that you could learn, instead of the book's. Why would you want to do something so daft? Because you could use an algorithm to figure out what ordering of primitives would give you your ideal RTK. Suppose you wanted to pack the lower-grade kanji (grade school) earlier in the sequence. Or make more common newspaper words appear earlier in the sequence. Or you had a corpus of literature that you really wanted to read sooner than later. In all these cases, your desiderata would become soft-constraints which a program would use to find a sequence of primitives which in turn build a complete ordering of the jouyou kanji such that your objectives were loosely met. It's a complicated graph optimization problem in the general case, but I think fast simple heuristics will get you 90% there, which is likely good enough, especially since I imagine most people would manually tweak the raw output (smooth out any clumps in the primitives, etc.). --- I guess I've already made this post about "Ways to improve RTK", so here's another idea. We could introduce new non-kanji primitives for combinations that are found more than once, like 羽+隹 used in 濯 and 曜 (I use someone's idea of calling this combination "futon"), or 臣+又 used in 賢, 腎, and 堅 (I'm calling 臣+又 "Ambassador", in the Firefly-courtesan sense). With the dependency graph, you could automatically find out where a group of primitives are being used multiple times and potentially decide to make new non-kanji primitives out of those. (Sure you could do this by just going through RTK and keeping notes on combinations that you encounter, like the two above, but by then you're not interested in ways to improve RTK )One last thing that I think we really ought to fix is the synonym keywords: I just got to "storehouse", which comes after "warehouse", and looking ahead I see "godown". Using WordNet semantic distance metrics, or just by asking those who've completed RTK, we need to identify clusters of synonyms and change them. The keywords, and the core ordering of primitives, used in RTK aren't sacred. I suspect those were chosen because they worked for Heisig himself, all those years ago, and aren't going to work for everyone today. The core of the method (kanji building on each other in large subsets) can be used with different keywords, primitives, and orderings. This question is feeling towards this larger goal. I look forward to reading many (as well as no) sound reasons why any/all of the above is foolish, from PAO down to changing keywords But like Khatz says, learning is fun. Once you've learned something, it's no longer fun because by then you're back to the grind of living and suffering, except now you complain about taxes and traffic in 日本語
How many RTKv1 kanji are used as primitives in subsequent kanji? - Vempele - 2014-08-06 The KanjiVG project has dissections of over 6000 kanji. How many RTKv1 kanji are used as primitives in subsequent kanji? - Katsuo - 2014-08-06 aldebrn Wrote:I calculated this by looking at the "Primitive look-up data."The "Primitive look-up data" is something I made as a quick and rough method to find kanji for someone who knows Heisig's primitive names and/or Japanese names for parts of the kanji. It's not comprehensive or precise and is intended for multi-lookup methods in a database rather than a spreadsheet. Quote:I guess I've already made this post about "Ways to improve RTK", so here's another idea. We could introduce new non-kanji primitives for combinations that are found more than once, like 羽+隹 used in 濯 and 曜 (I use someone's idea of calling this combination "futon"), or 臣+又 used in 賢, 腎, and 堅 (I'm calling 臣+又 "Ambassador", in the Firefly-courtesan sense)."Futon" was one of my ideas, so I hope it was helpful. I collected various people's suggestions (links below) a few years ago, but they are based on old editions of the book so the numbers won't match if you are using the latest. Characters useful for learning or simplifying others found earlier in the text. here Primitive combinations that are not given a name. here How many RTKv1 kanji are used as primitives in subsequent kanji? - aldebrn - 2014-08-06 Vempele Wrote:The KanjiVG project has dissections of over 6000 kanji.You've recommended this project to me a couple of times now (thanks!) and I'm finally properly checking it out. I got the latest release and I cannot seem to find a kanji I picked at random, 堅, U+5805. 5806 is there, but no 5805. Is it possible that the project hasn't gotten to this jouyou kanji? Or am I just not looking hard enough? Katsuo Wrote:"Futon" was one of my ideas, so I hope it was helpful. I collected various people's suggestions (links below) a few years ago, but they are based on old editions of the book so the numbers won't match if you are using the latest.This will be really helpful! I wish this list was somehow more integrated or integratable with the other RTK-derived data products. I personally have all the kanji and keywords, etc., in a Markdown file that I add stories to (looking them up on kanji.koohii.com) before learning them in Anki/Memrise, only glancing at the book occasionally (if not rarely, now that Fabrice added the "Sample Words" feature). I wish either the list, Koohii, or the book had some way of indicating these new primitives! If only we had a directed dependency graph !Edit: well, we have dependency graphs here and there, e.g., Ravenbrook, just not public and editable. How many RTKv1 kanji are used as primitives in subsequent kanji? - Vempele - 2014-08-06 aldebrn Wrote:You've recommended this project to me a couple of times now (thanks!) and I'm finally properly checking it out. I got the latest release and I cannot seem to find a kanji I picked at random, 堅, U+5805. 5806 is there, but no 5805. Is it possible that the project hasn't gotten to this jouyou kanji? Or am I just not looking hard enough?Its entry begins on line 23575 of the XML file. I also see 05805.svg in kanjivg-20140727-main.zip. <kanji id="kvg:kanji_05805"> <g id="kvg:05805" kvg:element="堅"> How many RTKv1 kanji are used as primitives in subsequent kanji? - aldebrn - 2014-08-07 Vempele Wrote:I also see 05805.svg in kanjivg-20140727-main.zip.Ok, with that Windows fail out of the way (sorry !), we have an answer! 530.Well, a more accurate but not yet perfect answer. For each of the 2200 jouyou kanji, I searched to see if they were mentioned the KanjiVG data for any of the other jouyou kanji. Doing this causes the last element of RTK1 巳 to match earlier kanji, viz., 把, 色, 絶, 艶, & 肥, since 巴's KanjiVG description relies on 巳. So there are going to be some of these kanji that really are leaf nodes appearing to be non-leaf nodes. There's 1670 of these leaf nodes, so that leaves 530 kanji that are used to build other kanji (non-leaf nodes). Due to the limitations of the method described above, this is an upper-bound of the number of non-leaves. So we've bracketed the answer to roughly between 514 (see op) and 530. I put a list of these tentative leaf nodes, with their RTK1 numbers, at https://gist.github.com/fasiha/e18fc2b8ea69f3afc9d0 Edit: updated it with the full component graph, which Github is putting first. Scroll down-down-down to see the list of the 1670 leaf nodes. Huh: 一 is not a component of 二, I wonder why. Vempele, if you could entertain another question about KanjiVG. I was happy to see in the entry for 潜 (06f5c.svg) that the 氵 element has a tag indicating that its original is 水. This is good, because that'll match an early RTK1 kanji. But for 臓 (081d3.svg), I am dismayed that the 月 element is shown to be derived from 肉, which isn't in RTK 1 or 3. Is this some advanced kanji convention, 月 deriving from 肉? (This question is about extending the above work to invert it: given a kanji, what makes it up, in terms of Heisig's primitives.) Thanks! Edit: in case anyone is dying to know: if you run the above processing against the 3028 RTK 1 and 3 kanji, the number of leaf nodes in the 2200 jouyou kanji drops to 1579, i.e., 91 kanji in RTK1 aren't used as primitives in RTK1 but are used as primitives in RTK3 (according to KanjiVG). This information is valuable, I promise! How many RTKv1 kanji are used as primitives in subsequent kanji? - Vempele - 2014-08-07 肉 is 1022 (meat) in RTK1 5th edition and should definitely still be there in 6th. "The abbreviated form of this character gave us the primitive meaning of flesh or part of the body for the kanji 月." How many RTKv1 kanji are used as primitives in subsequent kanji? - aldebrn - 2014-08-07 Vempele Wrote:肉 is 1022 (meat) in RTK1 5th edition and should definitely still be there in 6th. "The abbreviated form of this character gave us the primitive meaning of flesh or part of the body for the kanji 月."Sublime Text fail. I.e., I failed via Sublime Text. Thanks for putting up with me. So KanjiVG is really this smart. The same Unicode code point 月 is used in both these meat-related kanji and the moon-related ones like 明, and they've tagged only the former as being derived from 肉 :o How many RTKv1 kanji are used as primitives in subsequent kanji? - aldebrn - 2014-08-12 Vempele Wrote:The KanjiVG project has dissections of over 6000 kanji.To better visualize the KanjiVG database, I created this very simple visualization tool that decomposes a given kanji's SVG file into the elements that KanjiVG deigns to name and show its dependency structure. There are certainly some interesting twists and surprises, and I've just begun to study this database: http://fasiha.github.io/kanjivg-explorer/ Vempele, are you a KanjiVG dev? How many RTKv1 kanji are used as primitives in subsequent kanji? - Vempele - 2014-08-12 aldebrn Wrote:Vempele, are you a KanjiVG dev?No, I learned about it from http://namakajiri.net/nikki/testing-the-power-of-phonetic-components-in-japanese-kanji/ (see also: http://forum.koohii.com/showthread.php?tid=11002&page=2 ) How many RTKv1 kanji are used as primitives in subsequent kanji? - aldebrn - 2014-12-19 Stumbled on this while searching for something else. Using the kanji dependency graph I've been working on, I can answer this question now with some confidence. 513 RTK1 kanji are used as components of some other kanji in RTK1. 606 RTK1 are used as components of some kanji in RTK1 *and* RTK3. Those 513 non-leaf nodes in RTK1 are: 一二三五六七八九十口日月田目古吾冒朋明品呂昌早世胃旦亘旧自白百中千舌升丸寸占上下卓朝貝貞員見元頁凡万句旬勺首乙直具真工左右有刀刃切召昭則丁可子了女母貫兄小少大多夕名石肖光太臭奇川州水永泉原土圭寺火炎灰魚里黒量同守完安木林相本未末朱若苗兆犬然牛告先介合王玉呈全主金道車前各客処軍高享亭京景舎周士吉壮売敬言式成止歩武正定走是建延衣巾布市帯制雨雲冬天立章帝童匕匂頃北比昆皆旨毎敏乞欠次音亡荒方放曽東廷県虫風己包竜家羊焦午羽固因回麻心忍志串思意恵感憂必手我義戒刑才乃及史吏更又双隻奴友支叔反爪将采受愛広台去会至到致育充允出山入分公谷容賞皮波列死耳取最敢夫替失臣蔵巨力男加行復従徴秋利委秀米迷求竹人伐宿保付府任代化何久内丙肉以瓦善夜勿易尼屋屈居尺戸戻雇示尉禁宗祭察由甲申果斤折斬斥争唐君需両歯曲曹斗用昔廿庶度半巻片之不矢知矛務弓弔弱与身射考孝者官父交足路骨隊穴空丘兵糸維幾玄畜系孫却卸令勇疑厄宛留臼酉尊豆豊皿血監即良食既平凶辛新幸執卑亥麦青責表契害生星寿春奉垂乗今念予兼西要票栗南門間倉非侯干余束頼重動疾匹区登発彦参文斉楽央赤黄色甘某甚貴並普共異暴井亜角冊氏民浦郷郎盾段司舟般瓜妻面革呉牙番毛為長単鳥属岡缶就免象馬虚鹿能寅辰農鬼屯且丑卯. The extra 93 RTK1 kanji that become non-leaf when you look at RTK3 are: 別如沙泊時厘宣連夏冥覚故獄頻謁豪湯美差習困国園鼻丈密欲買規賛堅香奥数便卒物尽扇奈神質雪康庸散弟老著署諸追阜師累御零領通危喜節刈希頓静産奏難華陰廉粛英遣無助宜殿解郭后孤益衰声尾宅畏厳虎慮巳. A brief word about why this might be useful: if you're tempted to skip/delay learning a kanji, for whatever reason, knowing whether it's used as a building block of later, possibly more useful/interesting kanji can be handy to have. How many RTKv1 kanji are used as primitives in subsequent kanji? - lauri_ranta - 2014-12-21 I got 480 if the components have to appear before the kanji they are part of and 486 if not: $ curl -s jptxt.net/kanji.txt|awk -F\; '/^[^#]/&&$12&&$12<=2200'|sort -t\; -nk12|awk -F\; '{for(x in a){if($17~x){print x}};a[$1]}'|awk '!a[$0]++'|wc -l 480 $ rtk=$(curl -s jptxt.net/kanji.txt|awk -F\; '/^[^#]/&&$12&&$12<=2200');comm -12 <(cut -d\; -f1<<<"$rtk"|sort -u) <(cut -d\; -f17<<<"$rtk"|grep -o .|sort -u)|wc -l 486 The decomposition data is from https://cjkdecomp.codeplex.com, which doesn't for example include 史 as a component of 吏 or 賞 as a component of 償. (Maybe I should switch to kanjivg.) The "kanji decomposition" part of http://jptxt.net/miscellaneous.txt includes these 368 component kanji: 六 six 七 seven 九 nine 古 old 吾 I 冒 risk 朋 companion 明 bright 昌 prosperous 早 early 胃 stomach 旦 nightbreak 亘 span 旧 olden times 中 in 千 thousand 升 measuring box 丸 round 占 fortune-telling 卓 eminent 朝 morning 貞 upright 員 employee 元 beginning 凡 mediocre 句 phrase 旬 decameron 勺 ladle 直 straightaway 具 tool 真 true 左 left 右 right 有 possess 刃 blade 切 cut 召 seduce 昭 shining 則 rule 丁 street 可 can 了 complete 貫 pierce 兄 elder brother 少 few 多 many 名 name 肖 resemblance 太 plump 奇 strange 州 state 永 eternity 泉 spring 原 meadow 圭 squared jewel 寺 Buddhist temple 炎 inflammation 灰 ashes 量 quantity 同 same 守 guard 完 perfect 安 relax 林 grove 相 inter- 本 book 未 not yet 末 extremity 朱 vermilion 若 young 苗 seedling 兆 portent 然 sort of thing 告 revelation 先 before 介 jammed in 合 fit 呈 display 全 whole 主 lord 道 road-way 前 in front 各 each 客 guest 冗 superfluous 軍 army 享 receive 亭 pavilion 京 capital 景 scenery 舎 cottage 周 circumference 吉 good luck 壮 robust 売 sell 敬 awe 式 style 成 turn into 歩 walk 武 warrior 正 correct 定 determine 是 just so 建 build 延 prolong 布 linen 市 market 制 system 雲 cloud 冬 winter 天 heavens 章 badge 帝 sovereign 童 juvenile 頃 about that time 北 north 昆 descendants 皆 all 旨 delicious 敏 cleverness 乞 beg 次 next 亡 deceased 荒 laid waste 曽 formerly 東 east 廷 courts 包 wrap 家 house 焦 char 午 noon 固 harden 因 cause 忍 endure 志 intention 串 shish kebab 思 think 意 idea 恵 favor 感 emotion 憂 melancholy 必 invariably 我 ego 義 righteousness 戒 commandment 刑 punish 才 genius 乃 from 及 reach out 丈 length 吏 officer 更 grow late 奴 guy 友 friend 叔 uncle 反 anti- 采 grab 受 accept 愛 love 広 wide 台 pedestal 去 gone 会 meeting 到 arrival 充 allot 允 license 出 exit 分 part 公 public 容 contain 波 waves 列 file 取 take 最 utmost 敢 daring 夫 husband 替 exchange 失 lose 蔵 storehouse 巨 gigantic 男 man 加 add 復 restore 従 accompany 秋 autumn 利 profit 委 committee 秀 excel 迷 astray 求 request 伐 fell 宿 inn 付 adhere 任 responsibility 代 substitute 化 change 何 what 久 long time 内 inside 丙 third class 以 by means of 善 virtuous 夜 night 勿 not 易 easy 尼 nun 屋 roof 屈 yield 居 reside 尺 shaku 戻 re- 雇 employ 尉 military officer 禁 prohibition 宗 religion 祭 ritual 察 guess 由 wherefore 甲 armor 申 speaketh 果 fruit 折 fold 斬 chop off 斥 reject 争 contend 唐 T'ang 君 old boy 需 demand 曲 bend 曹 cadet 昔 once upon a time 庶 commoner 度 degrees 半 half 巻 scroll 之 of 不 negative 知 know 務 task 弱 weak 与 bestow 射 shoot 考 consider 孝 filial piety 者 someone 官 bureaucrat 交 mingle 路 path 隊 regiment 空 empty 丘 hill 兵 soldier 維 fiber 幾 how many 畜 livestock 系 lineage 孫 grandchild 却 instead 卸 wholesale 令 orders 疑 doubt 厄 unlucky 宛 address 留 detain 尊 revered 豊 bountiful 監 oversee 即 instant 良 good 既 previously 平 even 新 new 幸 happiness 卑 lowly 亥 sign of the hog 責 blame 表 surface 契 pledge 害 harm 星 star 寿 longevity 奉 dedicate 垂 droop 乗 ride 今 now 念 wish 予 beforehand 兼 concurrently 票 ballot 南 south 間 interval 倉 godown 侯 marquis 余 too much 束 bundle 頼 trust 重 heavy 動 move 疾 rapidly 匹 equal 区 ward 登 ascend 発 discharge 彦 lad 参 visit 楽 music 央 center 某 so-and-so 甚 tremendously 貴 precious 並 row 普 universal 共 together 異 uncommon 暴 outburst 井 well 亜 Asia 冊 tome 民 people 郎 son 盾 shield 段 grade 司 director 般 carrier 妻 wife 呉 give 番 turn 為 do 単 simple 属 belong 岡 Mount 就 concerning 免 excuse 象 elephant 虚 void 能 ability 寅 sign of the tiger 農 agriculture 屯 barracks 且 moreover 卯 sign of the hare 此 this here 奄 encompassing 尤 understandably 或 a 也 est 巴 comma-design 曼 mandala 云 quote 莫 shalt 仔 animal offspring 叩 bash 吊 dangle 呆 dumbfounded 屏 folding screen 庄 shire 洛 old Kyoto 逢 tryst 愈 in the nick of time 昏 dusk 肋 rib 胡 uncivilized 窄 tight 妾 concubine 坐 sitting in meditation 朔 first day of the month 爾 let it be 赫 incandescent It doesn't include any of the 214 traditional radicals, but it also includes RTK3 kanji that appear as components of RTK1 kanji. |