kanji koohii FORUM
Kanji Frequency in Wikipedia - Printable Version

+- kanji koohii FORUM (http://forum.koohii.com)
+-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html)
+--- Forum: General discussion (http://forum.koohii.com/forum-8.html)
+--- Thread: Kanji Frequency in Wikipedia (/thread-3216.html)

Pages: 1 2 3 4


Kanji Frequency in Wikipedia - wonderflex - 2009-08-20

cescoz Wrote:what can I say...Well organized and beautiful! but the calc doesn't work...I receice this as answer:######...something with the formula...
Just tried it myself and it worked just fine.


Kanji Frequency in Wikipedia - wonderflex - 2009-08-21

wonderflex Wrote:
cescoz Wrote:what can I say...Well organized and beautiful! but the calc doesn't work...I receice this as answer:######...something with the formula...
Just tried it myself and it worked just fine.
And now I see the problem. It's set so it can't be edited by people other than me. If you were to copy it into your own google docs then it would work.

Sorry about that.


Kanji Frequency in Wikipedia - aberu - 2010-01-23

wonderflex Wrote:
wonderflex Wrote:
cescoz Wrote:what can I say...Well organized and beautiful! but the calc doesn't work...I receice this as answer:######...something with the formula...
Just tried it myself and it worked just fine.
And now I see the problem. It's set so it can't be edited by people other than me. If you were to copy it into your own google docs then it would work.

Sorry about that.
Thank you so much for that spreadsheet! I have used it from the first couple hundred of kanji I did in RtK1 to completing it now, and looking at the percent complete everytime I updated that number really helped motivate me to finish. Even now that I just finished, I am still using it to look at the frequency of new kanji I want to learn before adding them to my anki deck Smile Thank you so much!


Kanji Frequency in Wikipedia - Snarp - 2010-03-13

This is a kind of selfish first post, but if someone would post or email copies of the .CSV or .TXT compound list(s) I would love them forever. (shang's domain looks like it expired and got squatted, so they're no longer downloadable from there.)

If someone were to email them to me, I'd be happy to host them and post the links here.


Kanji Frequency in Wikipedia - cholstra - 2010-04-07

Thank you guys for all the hard work you put into these. I found the frequency lists and the color coded lists incredibly useful for learning to read quickly and easily. The calculator and cumulative list was also wonderful for tracking my progress - it helped me stay motivated throughout!

Thanks again and good luck in your future ventures!


Kanji Frequency in Wikipedia - shang - 2010-04-19

Snarp Wrote:This is a kind of selfish first post, but if someone would post or email copies of the .CSV or .TXT compound list(s) I would love them forever. (shang's domain looks like it expired and got squatted, so they're no longer downloadable from there.)

If someone were to email them to me, I'd be happy to host them and post the links here.
Sorry I didn't notice this earlier. I've put the csv files back online at:

http://shang.kapsi.fi/kanji/

jawp-compounds.csv contains just kanji compounds, parsed with a simple (and flawed) algorithm. It might be of some use however, so I'm including it here.

jawp-mecab-words.csv contains word frequencies parsed with MeCab, so it should give a bit more realistic view of the frequencies of actual words.

jawp-kanji-full.csv is the original list of all kanji in Wikipedia and their associated data.


Kanji Frequency in Wikipedia - Snarp - 2010-04-21

Thank you, shang! You rule the world!


Kanji Frequency in Wikipedia - jcdietz03 - 2010-04-22

Sorry if this has been posted before.
How do you perform a frequency analysis of text in Japanese?
I don't need to analyze Wikipedia or anything, I would just like to analyze an anime script (about 2,000 words).
Is there free software that does this? My google search failed (probably because it was in English).


Kanji Frequency in Wikipedia - jcdietz03 - 2010-04-26

I ended up doing the analysis by hand. And here are the results!
http://spreadsheets.google.com/pub?key=ta9ANsrzYV9qAyI9dgrnbgA&output=html


Kanji Frequency in Wikipedia - yudantaiteki - 2010-04-26

If you just want to do kanji frequency, JWPce will do it automatically for you with the "count kanji" feature. Word frequency I'm not sure.


Kanji Frequency in Wikipedia - shang - 2010-05-20

I'm working on some new stuff with the Wikipedia corpus analysis. Currently rewriting the parser to get better data faster, so nothing too interesting yet, but here's one new table I've generated so far:

http://shang.kapsi.fi/kanji/kanji-article-count.csv

Instead of counting all the occurrences of a kanji, it counts the number of different wikipedia articles that a kanji appears in. This, for example, de-emphasizes kanji that are rarely used, but which might appear over and over in a limited set of articles discussing some highly specialized topic.

I'll have more time to code during the weekend, so I'll try to something similar for vocabulary, plus a bunch of new statistics.


Kanji Frequency in Wikipedia - aaroncp - 2010-10-14

These lists are very much appreciated. I will probably be using these to continue my kanji studies once I have finished RTK 1.


Kanji Frequency in Wikipedia - s0apgun - 2012-03-13

Someone posted this vocab frequency here I think...

https://docs.google.com/spreadsheet/ccc?key=0AscWM0WNU3s4cHNLaUJLWDE1d20wSWVPTTNIalNoQWc#gid=0


Kanji Frequency in Wikipedia - Seren - 2012-03-25

http://www.2shared.com/document/asuKWv_B/Kanji_Frequency_in_Wikipedia4.html

I sorted the kanji frequency on wikipedia into the following categories:

RTK1
RTK3
RTK Supplement (3008-3030)
Jinmeiyou
Other

and plotted it in a logarithmic graph just to see what it would look like.

This seems to indicate that Heisig did get pretty much all of the important kanji:

Most in RTK1 have more than 1,000 occurrences (up to 200,000)
Most in RTK3 have more than 100 occurrences (up to 15,000, plus one at 50,000)
(The RTK Supplement Kanji are between 100 and 10,000)

More importantly:
Of the other kanji, there are about 20 that have more than 1000 occurances that you might want to learn since they aren't super rare (as much as 0.01% frequency!!!)

The rest don't really occur with much frequency at all.

I'm not quite sure how to interpret the line for the Jinmeiyou as I have no idea how much of these characters are part of names.


Edit: When I wrote "RTK2" in the doc, I really meant "RTK3"


Kanji Frequency in Wikipedia - Marble101 - 2012-07-02

The OP posted that 2062 Kanji cover 99% of Wikipedia. That number is so tantalizingly close to 2044 that I wonder if it would be worth putting them into a spreadsheet alone to learn RTK-style after completing RTK.

Has anyone done this? And if so, how useful has it been?


Kanji Frequency in Wikipedia - twofoe - 2012-08-15

To know the "top 1500 kanji" in Wikipedia, you'd need to memorize these 26 kanji in addition to RTK1:
也劉塞幡庄弥梁狙葛李禄筑繋誰頃嘉殆畿卿拳阜薩那龍祀澤

For the "top 2000", these additional 162 will do:
柴砦鱗庵俺駿竣臼脊綴妖稽或斬倭僅伎倶鳳剥勾哨噂叩呪吊唄堆填宏宋尻崖嶺鷹揃捉播擢洲沙洛浩淳潰灘汲汎淀釜狼萩蓮蘭芦藍萌蘇蒙蓋菅遼隙隈晋昧股膝腺腫胡椅桁橘棲椎樽樺槍槻牽牝牡斑琉瑞玩癌磐磯祐稀窟靖篠纂篇綾綻蜂哉謎讃詣諏輔貌貼蹴跨醍醐麺鍋鍵闇鞍鞭顎餌馴鴨麓挫遡彗勃牌骸秦雀夷戚斐於甥畠亮胤敦翔陀菩迦淵嶋國堺應輌厩魏籠趙湘曾圓閏哺遽廠瘍

And for the top 99% percentile, another 28:
云侶俣凌喧捧憧湧湊溜渕曖枕楊楕痕戴賑霞頓鷲燕廻牟袁揆學裔

Here's all 216, all in order of frequency:
也劉塞幡庄弥梁狙葛李禄筑繋誰頃嘉殆畿卿拳阜薩那龍祀澤柴砦鱗庵俺駿竣臼脊綴妖稽或斬倭僅伎倶鳳剥勾哨噂叩呪吊唄堆填宏宋尻崖嶺鷹揃捉播擢洲沙洛浩淳潰灘汲汎淀釜狼萩蓮蘭芦藍萌蘇蒙蓋菅遼隙隈晋昧股膝腺腫胡椅桁橘棲椎樽樺槍槻牽牝牡斑琉瑞玩癌磐磯祐稀窟靖篠纂篇綾綻蜂哉謎讃詣諏輔貌貼蹴跨醍醐麺鍋鍵闇鞍鞭顎餌馴鴨麓挫遡彗勃牌骸秦雀夷戚斐於甥畠亮胤敦翔陀菩迦淵嶋國堺應輌厩魏籠趙湘曾圓閏哺遽廠瘍云侶俣凌喧捧憧湧湊溜渕曖枕楊楕痕戴賑霞頓鷲燕廻牟袁揆學裔

Hope that helps.

EDIT: +1 error : \


Kanji Frequency in Wikipedia - Animosophy - 2013-03-25

Thanks ^ that's really helpful Smile

If anyone is using the "Heisig's Remembering the Kanji 1+3 with 2010 joyo kanji" deck, only the following 18 kanji remain from the above list...

澤 堺 應 輌 厩 魏 籠 趙 湘 曾 圓 閏 遽 廠 袁 揆 學 裔

I thought I'd add nihongoresources' novel kanji list into the mix as well. Another 935 kanji are unique to Pomax's 1300-novel analysis if you finish the above deck, leaving only 8 kanji unique to the Wikipedia frequency analysis: 應 輌 魏 趙 廠 袁 揆 裔.

Novel kanji ordered by number of strokes...

几 孑 亢 壬 夭 乍 匆 叮 弗 戊 辷 冲 匈 卍 奸 屹 忖 戌 戎 芍 芒 佇 佝 佞 兌 吝 吶 吼 夾 尨 彷 忸 扼 抒 抓 抛 曵 杞 杣 汞 汪 沁 皀 糺 芟 芬 阮 乖 佩 佯 侏 侘 其 凭 刮 刳 卦 卷 呟 呵 呶 呷 呻 咀 咄 咆 囹 坩 宕 岨 峅 帚 庚 弩 徂 忝 忿 怏 怙 怫 拇 拈 拗 枉 枡 歿 毟 沓 沛 沮 沽 炒 炙 爬 狎 疚 疝 盂 穹 肭 苜 苟 苧 苹 虱 軋 邯 俎 俘 俚 剋 匍 呱 咫 咬 咸 哄 姨 孩 屎 峙 巷 怱 恆 恍 恟 恪 恫 恬 扁 拮 拱 拵 挌 昵 枳 枸 柩 毘 洟 炯 炳 炸 牴 玳 玻 珂 疥 癸 盃 盈 眇 眈 矜 砒 秕 紆 胛 舁 茗 茘 茫 茱 茴 茹 荊 荏 袂 陋 俯 倨 冤 凉 剔 匪 叟 哥 哭 哮 埃 埒 娑 娜 宦 宸 屓 崋 恙 悄 悋 悍 悖 悚 拿 挾 捏 捩 栴 桎 桓 殷 氣 浙 浚 浣 涌 涎 涕 烟 狷 畚 疱 疳 疼 眞 祗 祠 秣 窈 笊 笏 紊 缺 耄 耆 耿 胯 胼 舐 茣 荼 莢 袢 豺 躬 迸 陝 陞 偃 偕 偸 冨 凰 勒 唸 啀 啖 啜 娶 婀 崑 崗 崙 帷 徘 悵 悸 悽 惘 戛 扈 掏 掬 旌 桝 梃 梔 梳 毫 涵 涸 淘 淙 淪 烽 猊 猖 猜 甜 疵 痍 眷 硅 竟 羚 聊 舂 舳 菎 菟 菠 菲 蚯 蛆 衒 袱 訛 訥 貶 趺 逍 逕 逡 釧 閊 傀 剴 厩 啻 啼 啾 喀 喃 喇 喊 喘 喨 堙 堡 堺 壹 壺 奠 奢 媚 幇 廂 弑 徨 惠 惻 愕 捲 掣 揉 揣 揶 曾 棍 棕 棗 棹 椒 椚 渣 渫 游 渺 湘 湮 焙 焜 犇 猩 琥 琺 痙 稈 窘 竦 絮 翕 脾 腋 萱 萵 葭 葷 蛔 蛛 蛞 蛟 訝 訶 詛 貽 跋 跏 跖 閏 靱 馭 馮 傳 傴 剽 剿 嗄 嗚 嗜 嗟 嗤 圓 奧 媾 嫋 尠 愍 愾 慊 搏 搖 搗 搦 摸 斟 會 椴 楚 榔 歇 溯 溲 溷 滂 滄 滔 煌 煖 煥 瑕 瑜 畸 痳 痼 盞 睥 睨 睫 矮 硼 碌 碎 稙 稟 稠 筥 筮 絛 絽 腥 腱 與 蒋 蒟 蒡 蒻 蓆 蜀 蜃 蜈 蜉 蜒 裨 詭 誂 誅 跫 躱 辟 酩 飩 飫 僥 兢 劃 團 塹 嫖 嫣 嫦 嫩 孵 寥 對 嶄 幔 慇 慚 慟 慥 慴 慷 敲 榕 榜 榧 榮 榲 榴 槇 槐 槓 滾 漲 瑣 瑪 瑯 瘋 箏 箒 箝 綯 綽 翡 膀 膂 膃 膈 蓴 蓼 蔗 蜘 蜚 蜥 蜷 蜻 蜿 誑 誣 誦 誨 跼 輓 鄙 鉾 銓 銜 颱 骰 髣 儚 劈 嘶 嘸 噎 嬋 寫 廣 慙 慫 慳 憔 憚 憫 憮 撓 撥 撩 樟 漿 潸 潺 澁 澆 澎 熨 熬 璋 瘠 瘡 瘢 瘤 皚 皺 瞋 碾 磊 磋 箴 篁 篆 緘 緞 緬 縋 羯 翩 膠 膵 蔬 蕁 蝙 蝟 蝠 蝴 蝸 蟒 褪 褫 諍 豌 賣 賤 踞 輜 醋 銷 霄 靠 鞋 鞏 頤 餃 駑 駘 駝 魄 魯 鴇 鴈 麩 麾 僭 儕 嘯 噤 嚆 嬖 學 嶮 懈 懊 擂 擅 樸 橄 橙 橢 歔 澤 澳 濛 熾 燗 甌 瘰 瘴 盧 瞞 瞠 磔 窶 篝 篦 縉 縊 縒 縟 縢 罹 艙 蕭 蕾 薇 薐 薔 薨 螟 蟇 褶 襁 諡 諤 諳 諷 豎 蹂 赭 辨 醗 頷 頽 駢 駱 髷 髻 鬨 鮑 鴉 鴛 鴦 鴫 嬲 孺 彌 徽 懃 懦 擡 擧 擯 擱 斂 朦 檄 檣 濤 濱 燧 牆 獰 癇 癈 瞰 篳 簀 簇 簒 簗 糜 縲 縷 縹 縺 繃 罅 翳 聰 聳 膺 膾 臀 艱 螻 螽 蟋 謗 谿 豁 賽 蹈 蹉 蹊 蹌 輾 遽 邀 邂 鍍 鍔 鍜 闊 闍 顆 颶 餞 馘 駸 駻 鮟 鵄 齋 嚠 擲 擽 斃 旛 檳 殯 瀉 瀑 燻 瓊 瞽 禮 禰 穢 繙 繚 繧 翹 臍 臑 藪 蟠 蟯 覲 謦 謫 謳 贄 贅 蹣 轆 鎗 鎭 闖 鞣 馥 騏 魍 鵞 鵠 鼬 龜 嚥 嚮 壜 壟 懶 攀 曠 瀝 瀟 瀧 犢 礪 羸 羹 臘 艤 艨 藹 藺 蘆 蘊 蟷 蟾 襦 譏 譚 蹲 蹶 轍 鏖 鏤 靡 饂 鯰 鵯 鶉 攘 朧 瀰 瀾 矍 糯 繻 繽 罌 臙 蘚 蠕 蠣 襤 譫 躁 辮 醵 闡 韜 飄 驀 鹹 齟 囀 囂 囃 囈 巍 櫺 櫻 殲 爛 瓔 癩 癪 籐 蠢 譴 贔 鐵 霹 饐 饑 饒 騾 驃 鬘 魑 鶯 鶸 鶺 麝 黯 齎 齧 儼 彎 籠 羇 覿 讀 贓 贖 躑 轢 霽 韃 顫 驍 驕 鬚 鬻 龕 攣 攪 攫 罐 蠱 躙 轤 邏 髑 鷦 黐 黴 癲 羈 艷 讒 軈 顰 驟 鬢 魘 齲 齷 糶 顱 讚 顴 驥 鑽

Total of 3972 kanji if you learn all of them.

4000 kanji sounds like a pretty good foundation.


Kanji Frequency in Wikipedia - yudantaiteki - 2013-03-25

4000 kanji is a good *foundation*? It's a reasonable life goal!


Kanji Frequency in Wikipedia - uisukii - 2013-03-25

Maybe that was a typo, and Animosophy meant to type "hanzi". Tongue


Kanji Frequency in Wikipedia - Animosophy - 2013-03-25

But I want to be like a native Japanese university student now :'(

After those there's also kanji in news frequency data! Here's another 229 unique to this list in order of frequency.

咋 條 證 潘 栩 實 覃 藏 剱 鏝 鰒 蔟 圀 焉 罠 猷 礒 涛 崔 隕 邊 晝 怡 俑 佛 鐸 鉉 諫 泄 鑼 蛯 糀 獏 澂 櫃 曄 彿 嶌 頸 稷 瀋 昊 徊 姜 鑚 鍼 酊 鄒 螢 董 舛 縣 箙 甑 瑙 珀 熙 熏 澹 櫨 桿 巖 嵜 娩 墟 咤 劭 來 佗 齊 餡 霍 雖 隋 陜 鉗 鈞 釉 邱 衾 蛉 虻 臺 腿 翅 纓 筬 禹 礫 癬 瀚 滸 溟 湃 欅 梠 揄 寨 娟 堯 址 剌 凖 冽 冑 鱸 鰤 鰓 鰊 鰈 鮓 鬆 髏 饌 饉 韶 韋 靼 雍 隘 閔 鐙 鐔 鐐 鎔 鈿 醂 鄲 邉 邇 遙 逅 轉 輦 軛 趾 謐 詈 襷 袰 蠍 蟀 號 葯 莱 莪 莚 茵 茲 艸 舫 舩 膣 腟 腓 繹 繪 繩 綏 粂 籃 箟 筧 禾 碆 砿 矗 瞼 疽 疸 疆 甞 甕 瓷 珎 牀 燼 燵 濬 滓 溪 渠 淺 淇 洙 泗 泓 檻 樓 樂 槿 楡 棣 栫 栢 杳 晄 斌 收 戮 戀 悴 恂 彭 廼 廈 嶼 岱 尹 寇 姶 姚 奘 夘 壯 塙 囿 噪 吽 卉 匣 册 兒 儡 儘 傅 假 俥 俟

That makes 4201! They're like Japanese language MP points C:


Kanji Frequency in Wikipedia - uisukii - 2013-03-25

Animosophy Wrote:But I want to be like a native Japanese university student now :'(
Studying what, exactly? The 漢字検定?


Kanji Frequency in Wikipedia - Eikyu - 2013-03-25

Don't focus too much on kanji. It's been said before, but kanji is only a (small) part of learning Japanese. Most of these advanced kanji will be useless. That being said: 隕 is in 隕石。


Kanji Frequency in Wikipedia - dizmox - 2013-03-26

It's impressive if someone really understands all of these in the Japanese sense of understanding kanji, but just writing them? Not so much...


Kanji Frequency in Wikipedia - gombost - 2013-03-26

Animosophy Wrote:That makes 4201! They're like Japanese language MP points C:
Why don't you make actual words your Japanese language MP points? You can learn many thousands of words with the set of kanji found in RTK 1 & 3 and add new kanji to your RTK deck on the fly if they are needed.


Kanji Frequency in Wikipedia - Animosophy - 2013-03-26

I was under the impression that the average Japanese university student can recognise something like 4000-5000 kanji, with a vocabulary of 45,000-50,000 words. It was a reference of a reference of a book from yahoo answers somewhere that I probably can't find again.

Not sure I even know half as many words in English based on this test: http://testyourvocab.com/ (I got 18,600 I think), but since a lot of Japanese words can be deciphered with kanji, it didn't seem that daunting. On the flip side, readings seem to be all over the place.

All that said, I'm not in a hurry. I just know that if I want to be anything like fluent in Japanese, I better put some years into it and set the bar high Smile