kanji koohii FORUM
Tool to determine grade level of kanji in a text? - Printable Version

+- kanji koohii FORUM (http://forum.koohii.com)
+-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html)
+--- Forum: Learning resources (http://forum.koohii.com/forum-9.html)
+--- Thread: Tool to determine grade level of kanji in a text? (/thread-12594.html)

Pages: 1 2


Tool to determine grade level of kanji in a text? - Roketzu - 2015-03-18

juniperpansy Wrote:
Roketzu Wrote:I've even found an unexpected use out of it
Please share with us!
It's something that only really helps me specifically so I'm not sure it would be of much use to explain! I guess I could anyway.

I have covered all joyo kanji in Skritter and had the intention to also add all non-joyo kanji from words that I am familiar with, which would bring the total number up to around 3000 kanji. The problem was I would have to go through my 30K+ Anki vocabulary deck looking for all the non-joyo kanji I'm familiar with to add them into Skritter, which would be really time consuming. Instead of doing that I can just export my vocab into a text file and then load it all into this tool and it spits out every non-joyo kanji from my vocabulary deck, making it very easy to then add words containing those kanji into Skritter.

So, it just spits out this, which is exactly what I needed.

鯛阿陀鮭燗鮑菖蒲鼈鮎毘壕菩薩牡袱筅杓碗粽伊掬蝦夷禄襖雁杏蒡萩蛤埴蓮雛檜菱鳳凰瓢箪猪磯兜楓蠣鉾簪柏鰹茅葺鞠欅雉桔屏柑犀飴桐鯉蒟蒻隈栗羹楠姑鯵鮪槇卍鱒茸巫輿蓑勒噌簾芭蕉茗韮苔熨濡祓鷺賽笹蝉迦鴟樺獅蘇煤硯蕎筍凧蛸笥鱈麩桶鳶寅屠柘鵜槌鶯鰻侘蕨鞋葵齧拵仄儲叩叶庇眩繋貰騙諺叉只々癌呆掴炒鎧閻倦斐髭剃咤睨撫磋琢喋轟溜暢淵狐掻頷舐轢溢扮掏攣斧迂賑讐罠颯唸痒嬉其罹姪饉憑殲榴訛姦詭絆膏歪坐噛猥垢屑洒紐嬰爾腋碍咎霞撒黴拗俄皺壺脆隕謳瀕勿錆瞑蠢娑冴遁袈裟窶鳩詫塵蝶辿褻耽怯偲覗辜寵狗遼蕩杖瞞痺埃娼揉諍喧嘩裡躱躊躇馴辟嵌醍醐僻憫綴愕竦烏憐揃澱呑吊淀惹鬘昂鼾錨尖嗜豹縞霰雹撞躬枡黍熔誹謗讒萌昏庖凌駕捲些繚這殆尤墟砦輾喘胤悌仇躾彷徨稀枷牌蔓捧彿擡汲啖呵轍屁祟煽誑纏狼嘔擽鄙懣傭攫忝囓竄顛晦憺礫蹂躙貶愧捏遥焜蝋莫爛窄鞘鴫綾捩弛棲慟哭綜辻褄厨憚捺孕蒔綰彗乍筈贅閃侃諤楚螺凪滾此躁儚蒼穹玲瓏鱗嵩婉吻狡訣黎瑞茫惟沫悸煌蕾套嗚毟悍碧董憊云雀捌霹靂綺蓼諌籐鼬餐虔頁掟慙灼佚虻滲腿蠅饐窺捥縋嘘饒甦篝甥髑髏舵釘顰曾惚瑕徘徊匍匐吝哨樽臍蛆旭嶺乖擂晒藁錚弋軋匙贄頸楕驕漕屍尸杭唖俯曰疹袂魁允牝鷹叡智囮倅朶媾凱詢疇睛拮秤甕坦蟠賤撓舅蝿鰭什梃狸藪呻袴艱掠汝嗄睫梁隼胡贔屓糊狽湛翳窪痣翔褪浚瀾訊禿抉絨毯躯毅醤奢疼斡撥炙肛剋渾鋲媚諂茄泄煉薔薇堰悄漲痙痍糞靄梯儘蛙閊竪弩稟焉瘤鼠几棘忽諜焔曙仔鹵鋸釵鉋噂矩肋埒憖炸悖咳桿蜘蛛穿燦挽蝕穢悴繍鑽獰擢剽腱奸撼娶靨凛妓癇癪吠喰焚悶疚渠餃嗇莢薙峙徽詈梱祀遙戮爺瞰襤褸嗤砥悛囁椰臙揶揄屹噤吃檻燻靡懊丑磔憮蹌踉縊餞姥劫逼訥慇懃嬌咄嗟凭髯廓跨哄炬燵琵琶饅粥鉤圃暈吾逢卿襞懺牽咬驢脛艘榊鴉渚誅琉咀嚼冑訝瞼櫓夥爬牢櫛咥鋏椒馳芥摸鹸箒燭椀斯誂或厭愈謂嗽於蔭襁褓踵嘗涸贋嚏籤臥嘴諄悉篭濠囀竿嘸雫嫋屎屡伜碌葦茹稍軈哉聳逞歎蛋恰抓呟脹卯壷躓兎迚乃皰欒詛箝啜餠朦朧稜俎嘯拿迸魘腑涜睾嚢燕駁伍濘沓箭幌堵膣襦袢猜跪魑魍魎珈琲鞄妾猾縺遽骰刮盂蘭宥槍淋鞭洩祷恍沌虱蒙蹲埠椿艫舳橘趨騾欝疱瘡忖梳犇邁窩靱餉踝蝲蛄簀榑礑播鏃荼函逡佇飄峻綯贖伽靭琥珀麾挺恫緋盥痰樟狒珊瑚靖揆痔聡攘擲閤幡瞥洛毬庄裔楔茜艀蟻粟賺錐棍彦噪脾薹綽淹杞腔橙灌做瀆穽幟抒夭噎摺厩鯖劈紆敲匪藹諫驟攪撹戟踞姜瀉臀僥倖幇酩酊蹙鷲澹托眈糜肴膵芻鍮捷遑昌芒讃繃而咆哮蹟曝謐憔瞠嬲熾膿矜樋

Turns out Skritter hasn't actually implemented every single kanji so I'm informing them of what they are missing as I go.


Tool to determine grade level of kanji in a text? - aldebrn - 2015-03-18

heisigberg Wrote:I looked over your Javascript code & it is really good, functional-style code! Kudos!
I also took 'inspiration' from your code & implemented a basic version of the program in Flask framework in Python (I'm learning Flask right now).
Thanks! Everyone has been so nice about this app even though it's really simple and straightforward Smile

I would love to learn Flask---I used it just a tiny bit back in the day, and now with Heroku and VPS so cheap, I'd love to use it again and get back into backend development. Please post a link to your code when it goes live.

Roketzu Wrote:The problem was I would have to go through my 30K+ Anki vocabulary deck looking for all the non-joyo kanji I'm familiar with to add them into Skritter, which would be really time consuming.
So this (and I think a great many other tasks) can be easily done with regular expressions (regexps, online tutorial). Regexps are a mini-computer language for describing search-and-replace. Here's how I'd have done this with nothing more than Sublime Text 3, a reasonably-popular multi-platform text editor. I'm not sure if this will actually help anyone, and of course it's easier to use an actual program, but maybe if someone's thinking about learning regexps and needs clear proof of their power, this will provide it.

- Export your Anki database to plain text, and load it into Sublime.

- Run a regular expression that puts one character per line by inserting newlines after every character: Find -> Replace (Control-H in Windows), making sure to click on the ".*" icon on the left of the search bar (hover-tip is "Regular expression")

Find What: (.)
Replace With: \1\n

Click "Replace All". Note: type in or copy-paste what you see after the colon-space ": " exactly. I.e., open-parenthesis, dot, close-parenthesis for "Find What" and backslash, one, backslash, n for "Replace With".

- Sort all the lines: select all, Edit -> Sort Lines (Windows: F9)

- Remove duplicates: again, Find -> Replace (or Control-H in Windows).

Find what: (.)\n(\1\n)*
Replace With: \1

This removes duplicates from the sorted list of characters (seems to leave some whitespace at the top, no big deal, delete that yourself).

- Remove all joyo kanji: again, Find -> Replace,

Find What: [挨曖宛嵐畏萎椅彙茨咽淫唄鬱怨媛艶旺岡臆俺苛牙瓦楷潰諧崖蓋骸柿顎葛釜鎌韓玩伎亀毀畿臼嗅巾僅錦惧串窟熊詣憬稽隙桁拳鍵舷股虎錮勾梗喉乞傲駒頃痕沙挫采塞埼柵刹拶斬恣摯餌鹿叱嫉腫呪袖羞蹴憧拭尻芯腎須裾凄醒脊戚煎羨腺詮箋膳狙遡曽爽痩踪捉遜汰唾堆戴誰旦綻緻酎貼嘲捗椎爪鶴諦溺填妬賭藤瞳栃頓貪丼那奈梨謎鍋匂虹捻罵剥箸氾汎阪斑眉膝肘阜訃蔽餅璧蔑哺蜂貌頬睦勃昧枕蜜冥麺冶弥闇喩湧妖瘍沃拉辣藍璃慄侶瞭瑠呂賂弄籠麓脇哀慰詠悦閲炎宴欧殴乙卸穏佳架華嫁餓怪悔塊慨該概郭隔穫岳掛滑肝冠勘貫喚換敢緩企岐忌軌既棋棄騎欺犠菊吉喫虐虚峡脅凝斤緊愚偶遇刑契啓掲携憩鶏鯨倹賢幻孤弧雇顧娯悟孔巧甲坑拘郊控慌硬絞綱酵克獄恨紺魂墾債催削搾錯撮擦暫祉施諮侍慈軸疾湿赦邪殊寿潤遵如徐匠昇掌晶焦衝鐘冗嬢錠譲嘱辱伸辛審炊粋衰酔遂穂随髄瀬牲婿請斥隻惜籍摂潜繕阻措粗礎双桑掃葬遭憎促賊怠胎袋逮滞滝択卓託諾奪胆鍛壇稚畜窒抽鋳駐彫超聴陳鎮墜帝訂締哲斗塗凍陶痘匿篤豚尿粘婆排陪縛伐帆伴畔藩蛮卑碑泌姫漂苗赴符封伏覆紛墳癖募慕簿芳邦奉胞倣崩飽縫乏妨房某膨謀墨没翻魔埋膜又魅滅免幽誘憂揚揺擁抑裸濫吏隆了猟陵糧厘励零霊裂廉錬炉浪廊楼漏湾握扱依威為偉違維緯壱芋陰隠影鋭越援煙鉛縁汚押奥憶菓暇箇雅介戒皆壊較獲刈甘汗乾勧歓監環鑑含奇祈鬼幾輝儀戯詰却脚及丘朽巨拠距御凶叫狂況狭恐響驚仰駆屈掘繰恵傾継迎撃肩兼剣軒圏堅遣玄枯誇鼓互抗攻更恒荒香項稿豪込婚鎖彩歳載剤咲惨旨伺刺脂紫雌執芝斜煮釈寂朱狩趣需舟秀襲柔獣瞬旬巡盾召床沼称紹詳丈畳殖飾触侵振浸寝慎震薪尽陣尋吹是井姓征跡占扇鮮訴僧燥騒贈即俗耐替沢拓濁脱丹淡嘆端弾恥致遅蓄沖跳徴澄沈珍抵堤摘滴添殿吐途渡奴怒到逃倒唐桃透盗塔稲踏闘胴峠突鈍曇弐悩濃杯輩拍泊迫薄爆髪抜罰般販搬範繁盤彼疲被避尾微匹描浜敏怖浮普腐敷膚賦舞幅払噴柄壁捕舗抱峰砲忙坊肪冒傍帽凡盆慢漫妙眠矛霧娘茂猛網黙紋躍雄与誉溶腰踊謡翼雷頼絡欄離粒慮療隣涙隷齢麗暦劣烈恋露郎惑腕異遺域宇映延沿我灰拡革閣割株干巻看簡危机揮貴疑吸供胸郷勤筋系敬警劇激穴絹権憲源厳己呼誤后孝皇紅降鋼刻穀骨困砂座済裁策冊蚕至私姿視詞誌磁射捨尺若樹収宗就衆従縦縮熟純処署諸除将傷障城蒸針仁垂推寸盛聖誠宣専泉洗染善奏窓創装層操蔵臓存尊宅担探誕段暖値宙忠著庁頂潮賃痛展討党糖届難乳認納脳派拝背肺俳班晩否批秘腹奮並陛閉片補暮宝訪亡忘棒枚幕密盟模訳郵優幼欲翌乱卵覧裏律臨朗論圧移因永営衛易益液演応往桜恩可仮価河過賀快解格確額刊幹慣眼基寄規技義逆久旧居許境均禁句群経潔件券険検限現減故個護効厚耕鉱構興講混査再災妻採際在財罪雑酸賛支志枝師資飼示似識質舎謝授修述術準序招承証条状常情織職制性政勢精製税責績接設舌絶銭祖素総造像増則測属率損退貸態団断築張提程適敵統銅導徳独任燃能破犯判版比肥非備俵評貧布婦富武復複仏編弁保墓報豊防貿暴務夢迷綿輸余預容略留領愛案以衣位囲胃印英栄塩億加果貨課芽改械害街各覚完官管関観願希季紀喜旗器機議求泣救給挙漁共協鏡競極訓軍郡径型景芸欠結建健験固功好候航康告差菜最材昨札刷殺察参産散残士氏史司試児治辞失借種周祝順初松笑唱焼象照賞臣信成省清静席積折節説浅戦選然争倉巣束側続卒孫帯隊達単置仲貯兆腸低底停的典伝徒努灯堂働特得毒熱念敗梅博飯飛費必票標不夫付府副粉兵別辺変便包法望牧末満未脈民無約勇要養浴利陸良料量輪類令冷例歴連老労録悪安暗医委意育員院飲運泳駅央横屋温化荷界開階寒感漢館岸起期客究急級宮球去橋業曲局銀区苦具君係軽血決研県庫湖向幸港号根祭皿仕死使始指歯詩次事持式実写者主守取酒受州拾終習集住重宿所暑助昭消商章勝乗植申身神真深進世整昔全相送想息速族他打対待代第題炭短談着注柱丁帳調追定庭笛鉄転都度投豆島湯登等動童農波配倍箱畑発反坂板皮悲美鼻筆氷表秒病品負部服福物平返勉放味命面問役薬由油有遊予羊洋葉陽様落流旅両緑礼列練路和引羽雲園遠何科夏家歌画回会海絵外角楽活間丸岩顔汽記帰弓牛魚京強教近兄形計元言原戸古午後語工公広交光考行高黄合谷国黒今才細作算止市矢姉思紙寺自時室社弱首秋週春書少場色食心新親図数西声星晴切雪船線前組走多太体台地池知茶昼長鳥朝直通弟店点電刀冬当東答頭同道読内南肉馬売買麦半番父風分聞米歩母方北毎妹万明鳴毛門夜野友用曜来里理話一右雨円王音下火花貝学気九休玉金空月犬見五口校左三山子四糸字耳七車手十出女小上森人水正生青夕石赤千川先早草足村大男竹中虫町天田土二日入年白八百文木本名目立力林六亜尉逸姻韻畝浦疫謁猿凹翁虞渦禍靴寡稼蚊拐懐劾涯垣核殻嚇潟括喝渇褐轄且缶陥患堪棺款閑寛憾還艦頑飢宜偽擬糾窮拒享挟恭矯暁菌琴謹襟吟隅勲薫茎渓蛍慶傑嫌献謙繭顕懸弦呉碁江肯侯洪貢溝衡購拷剛酷昆懇佐唆詐砕宰栽斎崎索酢桟傘肢嗣賜滋璽漆遮蛇酌爵珠儒囚臭愁酬醜汁充渋銃叔淑粛塾俊准殉循庶緒叙升抄肖尚宵症祥渉訟硝粧詔奨彰償礁浄剰縄壌醸津唇娠紳診刃迅甚帥睡枢崇据杉斉逝誓析拙窃仙栓旋践遷薦繊禅漸租疎塑壮荘捜挿曹喪槽霜藻妥堕惰駄泰濯但棚痴逐秩嫡衷弔挑眺釣懲勅朕塚漬坪呈廷邸亭貞逓偵艇泥迭徹撤悼搭棟筒謄騰洞督凸屯軟尼妊忍寧把覇廃培媒賠伯舶漠肌鉢閥煩頒妃披扉罷猫賓頻瓶扶附譜侮沸雰憤丙併塀幣弊偏遍泡俸褒剖紡朴僕撲堀奔麻摩磨抹岬銘妄盲耗厄愉諭癒唯悠猶裕融庸窯羅酪痢履柳竜硫虜涼僚寮倫累塁戻鈴賄枠]
Replace With:

Note there's nothing in the "Replace With": delete any text in this box. This will delete all joyo kanji, leaving you with non-joyo kanji (and other stuff like kana, punctuation, numbers, latin, etc.).


Perhaps it *seems* like a lot of work, but if you do any amount of text processing, some practice with regexps is amply rewarded down the road. And this kind of processing uses the very simplest of regexp functionality. Sublime interactively highlights matches as you type in the "Find What" box, making it easy to construct regexps yourself. And Sublime uses the Boost Regexp engine, so you can do very powerful search-replace, all in a text editor. And if you find yourself doing the same processing repeatedly, you can make a macro in Sublime that'll automate it for you: code-free magic, in a text editor!


Tool to determine grade level of kanji in a text? - iani2004 - 2015-07-12

aldebrn Wrote:I threw together something because I had all the pieces on hand:
http://fasiha.github.io/kanjiyears/
If it's not too much work, could you add another statistic in the same colored squares?
For example, your default text could have this statistic about kanji (both unique and repeated):
Kanken 10: 3/80 (5/27)
Kanken 9: 1/160 (2/27)
Kanken 8: 1/200 (1/27)
Kanken 7: 1/200 (1/27)
Kanken 5: 1/181 (1/27)
where 27=10 kanji and 17 kana

So someone who knows hiragana and kanken 10 kanji will recognize 81% (17+5 out of 27) of the characters in the default text.

I think 27 characters (17 kana and 10 kanji) is more significant than 10 kanji, but that's debatable.


Tool to determine grade level of kanji in a text? - aldebrn - 2015-07-13

iani2004 Wrote:27=10 kanji and 17 kana
To make sure I understand, you want (1) the tool to report the number of kana used? But are either kana systems broken down per kanken or school grade? I guess I've always thought of kana as something you learn in bulk up front before learning kanji, and take it as a given…

The other part of your suggestion is, (2) instead of just showing what portion of the kanken level is in a text, also show what portion of the input lives in a given kanken level? That could probably be arranged.