Back

Nayr's Core5000 deck (Frequency Dictionary of Japanese)

#26
I am slowly coming around to the RTK Lite idea, so I thought to use this Core5000 deck to make an RTKLite list, as it's source frequency dictionary is based on the latest Japanese corpus research.

There's 1503 kanji that appear in Core5000's "word" field. Here they are (also at https://gist.github.com/fasiha/b9daaca92c98a04601c9 for revision-controlled posterity):

Core5000, all 1503 kanji, as they appear in Core5000

事言思物何私無行時人若来見今良所自分中後方訳本当持出考入作聞聴場合話使日風前多一子供非常気取知感番二同必要仕余僕皆彼食書次結構問題例目眼頃上他家付陽間違受葉少買手好返掛終意味形三最初大住近特誰友達緒生活国現在高悪乗変会社実際先女心金顔町街及体読昔教対水置楽声普通残車度強力全呼局歩男性学校世界状態然飲新早小相母以関係四店頭電長夜別者親名部立毎族果況代覚東京続俺父確法説明屋朝得選面白戻勉下始犬嬉遊理由簡単死回内容経験木点音海応与利用働杯共存絶切足走待写真五寝英語決忘口送姿期逆頑張示々道有難山程願昭和向連影響病院年花求情報十開島重婚認歴史増外進起嫌駅映画身客質含地域我娘平成図紹介安夢規定色探興売化印象品参加夏解件敢的様答過配笑午動主困環境比引辺離火施練習越企業奴怖絵酒原因守広猫調六落打夫数割丈申可能述是料歌幾互着記憶基月君魚旅健康深伝集戦争流致厳不限愛途研究済描驚元素晴座機信古識低曲去土振評価息似悲雨詳紀昨喜便迎耳表注雰囲痛将転川隣段律妻冬具発繰泣七紙努判断両念庭払船九授亡止議茶直接効触飼民殺徴神暮飛雑誌怒整備建談絡懸命突首設指失光並八美優異婆至育種類商胸頼材勝運捨幸横伴予馬完格位展飯危険想像許文借準春寒般第押責任択負技術減焼慣推半条医秋趣徒従検討占査保正字空役球公園個消激短薬試遅資扱摘除週政府計階量田舎寂再勤吸希望急勧齢疲弱渡患壁路反提肩匂銀購右雪仲緑狭敗功協伯産卒降野菜告移血涙逃警察兄細静腰催瞬暗届窓伸森標更謝腕鹿忙遠苦労課仰為悩論録温眺極積故鳥市眠固値避穴捕洗左珍尋制登額塩奥組傾拡疑肉差章背景担豊助詰阪支抱追権算納織薄治療昼赤較荷魅村撮導韓営南季節枚輩員距泊充散黒石挨拶携帯休甘弟卵服妹交訪汗改善曜咲暑慢浮益裏姉継北宿掲都就職派舞台弾宅腹呂香菓被害椅桜躍恥皮革夕抜敵側祭沖縄幼稚処勢暇隠緊帰席歯星砂糖維太湯約束米周伺精冷適宗範裕械寄費層旦那揃誘採飾純植契叫駄髪管工婦監督鍋鼻復塗老倒翌則販諦障迷惑締餌床黙護等演奏泉室載肌症骨偉熱援握司盛犯罪批師浜脳器寺底刺吹訴把震昇複観祖抵狙迫唯青了未視波誕汚畑履膝案務孫包式虫到痩遥収掃徐促癖速製慌措油線造筋策濃館靴礼炒投玄込証沸剣巻恐牛乳否踏干箱泳邪踊攻撃弁久幅挑独壊仮輸満鍵詩西項塾県耐揮閉芝居析釈奪傷蔵庫慮港換防妊娠貸破削遣沿輝襲募属富喉布団宇宙唇釣覧拾絞江戸天敷洋承偶懐机勇皿微妙財拠訓区鮮恵煮旨省放己端模鳴恋句戚央請浴縁吐渋谷玉御貴城覆橋巨志易抗禁令績埋依招陥駐氏蓋墓針秘密該礎草義扉廊漢皇根農丁寧悔雲操統留著救巡潰概奮鏡千煙照委膨列坂索飽悟糸逮渉枝科免講円臓騒溜晩給各幹貢献臣紅控魔士叱王税預柔憧池柱鉄殊裁専門揺匹憲埼燃末胃州袋競獲板挙脇測欠遺魂滞廃闇栄養乾称鍛燥添陰漬号混築穏崩百尻順舌総滑漂永冗巣胞犠牲殴率湖稼劇看涯湾拭武需践刻頻繁快修沈爪氷鋭雌跡毛慎尊敬尽謎井喫嫁災型聖乱癒磨粉賛誤均翻房才株誇液衝麦掘詞崎矢祈膚併徹脱党焦退編林帽盗冊紐岩粋緯秀瞳歳診缶岸領版丘酸贅沢融黄片軽延厚浅璧雇雄欧酢拒札折刷曖昧福籠稿装棒圧矛盾縦隙羨潜万漏孤溶奇滅栗幌棋輪僚芸仙染頬宣貧竹暴角郵賃蛇譲遡鶏排兼奈穫抽狂酵羽盤丸濯勘臭級顧爆軍隊松停寿幕励辞披露宴候唱郷畳祉栽培隅斉盆審岡仏欲捜柄芋怪鉛筆遂熊贈枠児童蹴蜂蜜傍衣郎烈俳華抑郊衛憩猿蓄賀漠脂肪炎敏瓶縛粧坊射臨豚源餃堂裸撫芽嵐豆損肝凝蚊駆棚虐賑獄豪替鬼往縮符貯哲鮎威酔稽創荻窪麺綻双傘炭鞄斜補卓侵縫浸箸滝透即克眉俗刀官略核羊姜冒倉径群訟刃兵債副慰梅霊壼祝儲劣涼泡賞銃撒霧肢錯顕貿挿賢腐惨辛餅鎌亀衆吉祥貫潟箇擦儀舗杉彫爽肘這捧迅緩枯絆槍秩序倍覗棄鐘歓軸頂睡葱据庶騰挫誓里典偏屈軒紛埃既網刑執髭乏汽塔柿署唆

Here's the 1503 sorted in Heisig RTK1+3 style, including eight in Core5000 not in RTK1+3, i.e., take the RTK1+3 order and keep the Core5k kanji.

一二三四五六七八九十口日月田目古冒明唱品呂早世胃旦自白百中千舌昇丸肘専占上下卓朝員見児元頑負万句肌的首乱直具真工左右有貢項刀刃切昭則副別丁町可頂子了女好母貫兄克小少大多夕外名石砂削光太器臭妙省厚奇川州順水氷永泉原願泳沖江源活消況泊湖測土吐圧埼涯寺時均火炎畑災点照魚里黒量埋同向字守完宣安宴寄富貯木林森枠棚植椅枯村相机本札案燥未末昧味妹株若草苦薄葉模漠墓暮眺犬状黙然荻猫牛特告先洗介界茶合塔王玉現狂皇全理主注柱金釣針道導迅造迫逃辺巡車連輸前各格略客額夏処条落冗軍輝運夢高塾京涼景舎周週士吉売学覚栄書攻敗枚故敬言警計獄討訓詰話詩語読調談式試域栽載戚成城威滅減浅止歩渉頻企歴武正証政定走越是題建鍵延誕礎衣裁装裏壊遠猿初布幅帽幕幌市柿姉帯滞刺制製転芸雨雲冬天橋立泣章競諦童瞳鐘商適敵叱匂頃北背比皆混旨脂毎敏梅海乾腹複欠吹歌次資姿培音暗識鏡境亡望方坊肪訪放激脱説鋭増贈東妊染燃歳県地池虫蛇独風己起改記包胞泡亀電滝豚遂家嫁豪場湯羊美洋詳鮮達羨差着唯誰焦集進雑雌準奮奪確午許歓権観羽習翌曜濯困固国団因園回店庫庭床磨心忘認志誌患思応意想息憩恵恐惑感忙悟怖慌悔慣慎憶憧添必手看我義議犠拭抱抗批招打捨摘挑指持拶揮推提損拾担拠描操接掲掛研械鼻刑型才財材存在携及吸扱丈史更双護獲奴怒友抜投設撃支技枝肢怪軽督寂反坂板返販爪乳浮将採菜受授愛曖払広拡弁雄台治始窓去法会至室到致互棄育充銃流唆出山岩炭崩密蜜嵐崎入込分貧公松訟谷浴容溶欲裕鉛沿賞党堂常皮波婆披破被残殊列烈死瞬耳取趣最撮恥職聖敢聴懐慢買置寧環夫規替賛潜失鉄臣蔵臓賢臨覧巨拒力男労募劣功勧努励加賀脇協行律復得従徒待往径彼役徹徴微街稿稼程税稚和移秋私秩秘称利穫香季委秀透誘稽米粉粧迷粋謎奥数類膝様求球救竹笑筋箱筆等算答策築人住位仲体件仕他伝仏休仮伯俗信依例個健側停値倒儀仙催使便倍優宿傷保付符府任賃代袋貸化花傾何荷傍俺久内柄肉腐座挫卒傘以似併瓶営善年夜液換融施遊旅物易尻履屋握屈掘居据層局遅漏刷尽沢訳択昼戸肩房戻涙雇顧示礼祥祝福祉社視奈慰禁宗祭察擦由抽油宙届軸押挿申伸神捜果菓課裸析所祈近折哲誓断質訴昨作雪録尋急穏侵浸寝婦掃当争事糖康逮君群耐需端両満画歯曲料科図用備昔錯借措散庶席度渡焼半伴判巻勝片版乏芝不否杯矢族知挨矛柔務霧帰引強弱沸費第弟号誇汚与写身射謝老考教者煮著箸署暑狭頬追師官管父交効較校足促距路露躍践踏骨滑鍋過阪際障隙陽防院隊降階隣隠陥穴空控突究窪探深丘兵浜糸織縮繁縦線綻締維練緒続絵統絞給絡結終級紀紅納紛紹経約細索総繰継緑縁網緊縛縄幼後幾機玄蓄係孫懸御服命令齢冷領勇通踊疑凝範犯危腕卵留貿印興酒酵酢酔配酸尊豆頭短豊喜皿血盆盗温蓋監盛塩銀根即節退限眼良娘食飯飲飾餌館餅養飽既概平呼評希胸離殺爽純辛辞壁璧避新親幸執報叫収勢熱核刻該述術寒譲素麦青精請情晴静責績積債漬表契喫害割憲生星性牲産蜂縫寿春奏実棒勤漢難華睡乗今含念陰予序預野兼嫌鎌西価要腰漂標栗覆煙南献門問間闇簡開閉聞倉創非俳排悲罪輩扉喉候決快偉違緯衛韓干肝汗軒岸幹芋宇余除徐途斜塗束頼速整剣険検重動働種衝病症痩疲痛癖医匹区殴欧抑仰迎登発廃僚療彫形影杉顔膨参惨修珍診文対蚊斉済楽薬率渋央英映赤変跡恋湾黄横把色絶甘棋期基勘貴遺遣潰舞無組狙祖査助畳並普顕霊業僕共供異港暴爆選井囲悪円角触解再講購構論輪偏編冊典氏紙婚低抵底民眠捕舗補郊部都郵那郷響郎廊盾派衆段鍛司伺詞飼般盤船孤益暇敷来気汽飛沈妻面麺革靴声眉誤承極芽邪釈番審翻毛宅為長張髪展巣単戦弾桜脳悩厳挙鳥鳴鶏島援緩属偶隅逆遡岡缶揺就蹴免晩勉象像馬験駐駆駅騒駄驚騰膚慮劇虐鹿熊能態演震振娠唇農濃送関咲鬼魂魔魅襲雰箇癒潟髭儲嬉揃捧撫撒溜葱這遥槍絆紐賑鞄鮎覗々炒贅籠餃姜壼埃

The way this list was generated isn't ideal since the RTK1 vs RTK3 split is no longer meaningful in this context, but as far as I know, nobody has made a combined RTK1+3 list that interleaves RTK3 kanji into their natural place in the RTK1 ordering. This will be made easy by a kanji dependency graph, which I'm working on on the side, using some fancy fancy tools.

Core5k, top 1000 kanji, Heisig order, including two not in RTK1+3
一二三四五六七八九十口日月田目古明品呂早世旦自白中昇占上下朝員見元頑負句肌的首直具真工左右有項切昭則別町可子了女好母兄小少大多夕外名石砂削光太器妙省川水泉原願泳沖江活消況泊土吐寺時火畑点魚黒量同向字守完安寄富木森植椅村相机本案未味妹若苦薄葉模暮眺犬状黙然猫牛特告先洗介界茶合玉現全理主注金釣道導造迫逃辺車連輸前格客額夏処条落輝運夢高塾京景舎周週売学覚書攻敗枚故言警計討訓詰話詩語読調談式試域載戚成減止歩企歴正証政定走越是題建鍵誕裏壊遠初布幅市姉帯刺制製転雨冬天立泣章諦商適敵匂頃北背比皆旨毎海腹複吹歌次資姿音暗識境亡望方訪放激説増東妊県地虫独風己起改記包電家場湯美洋詳鮮達差着唯誰集進雑準奪確午許権観習翌曜困固国団因園回店庫庭床心忘認誌患思応意想息恵恐惑感忙怖慌慣憶必手我議抱批打捨摘挑指持拶揮推提拾担拠描接掲掛研械鼻財材存在携及吸扱丈史更護奴怒友抜投設撃支技督寂反返販乳浮将採菜受授愛払広拡弁台治始窓去法会至室到致互育充流出山入込分公谷浴容裕沿常皮波婆破被残死瞬耳取趣最撮恥職敢聴懐慢買置環夫規失蔵覧力男労募功勧努加協行律復得従徒待彼役徴微街程稚和移秋私利香季誘米迷奥数類膝様求球笑筋箱等算答策人住位仲体件仕他伝休仮伯信例個健側値倒催使便優宿傷保付府任代貸化花傾何荷俺久内肉座卒以似営善年夜換施遊旅物履屋握居層局遅訳択昼戸肩戻涙示礼社視宗祭察由油宙届押申伸神果菓課析所近断質訴昨作雪録尋急寝婦掃当争事糖康君耐端両満画歯曲料図用備昔借措散席度渡焼半伴判巻勝芝不否杯族知挨務帰引強弱沸費第弟汚与写身謝老考教者煮暑狭追師管父交効較校足促距路躍踏骨鍋過阪際障陽防院降階隣隠穴空突究探深浜織線締維練緒続絵絞絡結終紀納紹経約細繰継緑縁緊縄幼後幾機玄係孫懸御服命齢冷勇通踊疑範犯危腕卵印興酒配頭短豊喜皿血温監盛塩銀節限眼良娘食飯飲飾餌館平呼評希胸離殺純壁避新親幸報叫収勢熱述術寒素青精請情晴静責積表契害割生星性産春奏実勤難乗今含念予野嫌西価要腰標南問間簡開閉聞非悲罪輩喉決偉違韓干汗宇余除徐途塗束頼速整剣険検重動働種病症痩疲痛癖医区仰迎登発療形影顔参珍文対済楽薬渋央英映赤変恋横把色絶甘期基貴遣舞無組狙祖査助並普業僕共供異港選囲悪触解再購構論紙婚低抵底民眠捕部都那響派段司伺飼般船益暇敷来気飛妻面革靴声承極邪釈番宅為長張髪展単戦弾桜脳悩厳鳥鳴島援属偶逆就勉象像馬験駅駄驚慮鹿能態演震振娠唇濃送関咲魅襲雰嬉揃遥々炒

Core5k, top 500 kanji, Heisig order, including one not in RTK1+3

一二三四五六七八九十口日月目古明品早世自白中上下朝見元頑的首直具真有切昭別町可子女好母小少大多外名光川水原願活況土時火点魚同向守完安木相本味若葉暮犬状然猫特先介界茶合現全理主注金道辺車連前格客夏落運夢高京売学覚書言話語読調談域成止歩企歴定走越是題建初転雨冬立泣商頃比皆毎海歌次姿音識境亡方説増東地風起記電家場美詳達着誰集進雑確午習困国因回店庭心忘認誌思応意想息感怖憶必手我議打捨指持描接掛研材存在及丈史奴怒友設返将受授愛払広始去法会至致互育流出山入分容常婆残死耳取最敢聴買置環夫規失力男努加行律得待彼徴街程和私利数類様求笑答人住位体件仕他伝信例健使便優付代化花何俺内座以似年夜施遊旅物屋局訳戻示社由申神果所近断質昨作寝当争事康君両画曲料図用備昔度伴判勝不杯族知引強与写身考教者父効校足過際陽院隣突究探深練緒続絵絡結終紀紹経繰後幾機係懸命通危印興酒配頭喜限眼良娘食飯飲平呼評胸離殺新親幸報述素情晴表割生性実難乗今含念予嫌価要問間簡開聞非悲決違余途頼整険重動働種病痛迎発形影顔参対済楽英映変横色絶期基無並普業僕共供異選囲悪触解構紙婚低民部響段飼船来気飛妻面声番長張展単戦厳島逆勉象馬験駅驚能態振送関雰嬉々

Notes

々, which shows up near the end of the Heisig-sorted lists (last entry in the 500 list), is the ideographic iteration mark, indicating kanji repeats. It's not a kanji as such.

I haven't taken a super-close look at these lists, other than verifying 一 through 十 were all there, so in the case of any glaring problems, please let me know. I realize that by jiggering these numbers a little to the right, i.e., choosing 502 or 1011, I might have gotten more bang-for-the-buck (the extra 2 or 11 kanji adding much coverage), but I just threw this together in a single Javascript console session and want to start learning them Smile I don't have any more time for meta!

And on the subject of RTK Lite, I think since I'm already half-way through RTK1, I might as well do the full 1503 list and get all 5000 sentences in Core5k covered.
Reply
#27
Phenomenal. Thank you and your wife for this deck, esp the added audio. Exactly what I was looking for.
Reply
#28
I would just to add, that this deck is not recommended for beginners.

I would recommend being low-mid intermediate before attempting this deck.

I also highly recommend using morphman to sort the card into the optimal learning order.
Reply
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions! - Sign up here
JapanesePod101
#29
I must be dumb but the deck I have downloaded and imported into Anki does not seem to have any audio. All "sound" fields are empty. Sad

Does anybody have a clue?

Thanks a lot for the great deck. It looks very useful.

Edit: problem solved. It was my mistake. I had mixed up two files. Everything works fine.
Edited: 2014-10-05, 2:19 pm
Reply
#30
Downloaded the deck, looked through it, and the quality is amazing. Significantly better than Core6K. Audio is slightly, but noticeably, better (not much room for improvement, since Core6K has good audio too), and the sentences are much better. Far more useful for learning grammar. And, at intermediate level at least, you can pretty much just learn the grammar points straight up, just with the help of the translation. Not much explanation needed.

Add to that the updated word selection (Core6K may have been the most popular words in newspapers once, but I seriously doubt it is today), and I highly recommend using this deck instead of Core, for intermediate learners.

Even though it comes a little late for me (I've done a big chunk of Core6K already) I think I'll drill through the sentences anyway, with the audio up front, for listening comprehension. The sentences are read fairly fast too, which is great.
Edited: 2014-10-05, 8:47 am
Reply
#31
I second this. Great deck, a very relevant alternative to the regular core decks. Not as beginner friendly perhaps, but it doesn't feel outdated at all. I've been using it for about a week now, and I just ditched the regular core deck even though I was already pretty far in - I just got really tired of 80's style textbookish "business" Japanese.
Reply
#32
There is a significant volume differential between some of the audio files (up to 25%, according to my analyzer thingy). It can be a little distracting.

I ran my local media files through a volume booster, to set them all to the same level, but it's only a tiny little free program, so the results aren't perfect (probably not worth sharing). I thought they're an improvement, but I bet there's better software out there than Mp3Gain (that's what I used).

Maybe someone who works with audio regularly has a better solution to fix this issue (if anyone else even thinks it's an issue - maybe it's just in my head).
Edited: 2014-10-11, 3:41 pm
Reply
#33
I got this deck waiting for me in Anki, but I'll finish Core6k first anyway, and some smaller grammar decks in between...

Is there any easy, automatic way to suspend all cards that already contain known words? Or better yet, perhaps suspend all equivalent cards in Core6k.
Reply
#34
aldebrn Wrote:I am slowly coming around to the RTK Lite idea, so I thought to use this Core5000 deck to make an RTKLite list, as it's source frequency dictionary is based on the latest Japanese corpus research.

There's 1503 kanji that appear in Core5000's "word" field. (…)
I am a bit surprised by those numbers. Does this mean that 5000 words is far from enough to include most of the usual vocabulary, or that the official list of over 2000 kanji includes several hundred of unfrequent characters (somehow supporting the RTK-lite idea)?

How far should one go in the word frequency list to capture all, or perhaps 95% of the jouyou kanji?
Reply
#35
You will notice that he scanned the the kanji in the "word" field, not the "expression" field, so that number is only a representation of the 5000 individual words, not the 5000 sentences as a whole.


That being said, you would be surprised on how few kanji are actually used in native media.

There was an interesting article written here: http://pomax.nihongoresources.com/index....1223039457

Basically:
Here's the list of how many kanji you need to know in order to have covered which percentage of kanji-use (on average) in about 1300 novels:

up to 10%: 11 kanji
up to 20%: 20 more kanji: 31
up to 30%: 33 more kanji: 64
up to 40%: 50 more kanji: 114
up to 50%: 74 more kanji: 188
up to 60%: 103 more kanji: 291
up to 70%: 148 more kanji: 439
up to 80%: 222 more kanji: 661
up to 90%: 348 more kanji: 1045
Edited: 2014-10-13, 6:13 am
Reply
#36
Nayr182 Wrote:Here's the list of how many kanji you need to know in order to have covered which percentage of kanji-use (on average) in about 1300 novels:

up to 10%: 11 kanji
up to 20%: 20 more kanji: 31
up to 30%: 33 more kanji: 64
up to 40%: 50 more kanji: 114
up to 50%: 74 more kanji: 188
up to 60%: 103 more kanji: 291
up to 70%: 148 more kanji: 439
up to 80%: 222 more kanji: 661
up to 90%: 348 more kanji: 1045
GREAT quote Nayr182, but you left out the most interesting numbers! Here they are:

"up to 95%: 377 more kanji: 1422
"up to 99%: 768 more kanji: 2190"

According to this talk on language mastery by a "polynot" computer scientist
(which I think is a treasure trove of information and totally worth watching at 1.25x speed), you need 98% comprehension for "pleasant free reading". That's somewhere around 2000 kanji. But I'd guess those kanji won't be exclusively the jouyou kanji of RTK volume 1, since the 1503 kanji in Core5000 (the top 5000 words in Japanese according to latest Japanese corpus research) include some outside RTK1/3.

Also, according to this talk, averaged over various unspecified languages, 98% corresponds to 20'000 words, which is apparently how many words one learns up to adulthood (by year 20, people on average know 20'000 words, which they learn about 1000/year; another kanji learner made some notes at http://kazemakase.ca/2013/08/04/fluent-c...-polyglot/).

jmigno Wrote:Does this mean that 5000 words is far from enough to include most of the usual vocabulary, or that the official list of over 2000 kanji includes several hundred of unfrequent characters (somehow supporting the RTK-lite idea)?
Definitely the latter. The jouyou list contains many legacies of politicking and cultural posturing ("what will our ancestors think if we allow our children to never learn 藤!?" (藤 as in Fujiwara)). These 1503 kanji needed to "spell" the top 5000 words should take you to nearly 90% comprehension.

Edit: here's the 7.5 kanji in Core5k's words that aren't in Katsuo's combined-RTK1/3 spreadsheet: 炒贅籠餃姜壼埃 (and 々).
Edited: 2014-10-28, 10:57 am
Reply
#37
Thank you for providing all these data. This is quite illuminating!
I am definitely sticking to RTK lite for now!
Reply
#38
I would be interested to know how many kanji appear from the 'expression' field.
Could you run the numbers on that aldebrn?
Reply
#39
Nayr182 Wrote:I would be interested to know how many kanji appear from the 'expression' field.
Could you run the numbers on that aldebrn?
どうぞください:

* 1620: # of unique kanji appearing in the 'expression' field (sentences)
* 1503: # of unique kanji appearing in the 'word' field
* 127 kanji in expression sentences but not in word: 貨毒奨吠博尾掻幽旗蒸扇沌暖陸旬叔炊泥潔拍鉢億曇貼姻遭弊漁偽肯軟糧彰茂却誉噛牧晦航券麻痺銭濡凶斬嘆罰蛍疎巾詐欺清莫姑倹揚汁崖朴忠帝嘘騙塀鳩叩掌灰巧詣摂猛岐寮帳脈阻裂徳挟桃河薪袖詠旺栓票憾函飴漫釜譜苺湿荘廷寸貪侍碗捻召撲刈噂憂鬱超溢壇澄津妨拝粘汲狩惹籍庁罠玩
* Note that 127 is not 1620-1503=117: some kanji appear in 'word' and not in 'expression'. This is because sometimes kanji in 'word' are spelled out with kana in the 'expression'. 10 kanji in 'word' but missing from 'expression' due to this: 敢是幾揃膝癖奪御巡総
* Edit: total number of unique kanji in 'word' and 'expression' fields: 1630.
Edited: 2014-10-17, 9:20 pm
Reply
#40
aldebrn Wrote:
Nayr182 Wrote:I would be interested to know how many kanji appear from the 'expression' field.
Could you run the numbers on that aldebrn?
どうぞください:

* 1620: # of unique kanji appearing in the 'expression' field (sentences)
* 1503: # of unique kanji appearing in the 'word' field
* 127 kanji in expression sentences but not in word: 貨毒奨吠博尾掻幽旗蒸扇沌暖陸旬叔炊泥潔拍鉢億曇貼姻遭弊漁偽肯軟糧彰茂却誉噛牧晦航券麻痺銭濡凶斬嘆罰蛍疎巾詐欺清莫姑倹揚汁崖朴忠帝嘘騙塀鳩叩掌灰巧詣摂猛岐寮帳脈阻裂徳挟桃河薪袖詠旺栓票憾函飴漫釜譜苺湿荘廷寸貪侍碗捻召撲刈噂憂鬱超溢壇澄津妨拝粘汲狩惹籍庁罠玩
* Note that 127 is not 1620-1503=117: some kanji appear in 'word' and not in 'expression'. This is because sometimes kanji in 'word' are spelled out with kana in the 'expression'. 10 kanji in 'word' but missing from 'expression' due to this: 敢是幾揃膝癖奪御巡総
* Edit: total number of unique kanji in 'word' and 'expression' fields: 1630.
手伝ってくれてありがとう。
Reply
#41
Nayr182 Wrote:
aldebrn Wrote:どうぞください
手伝ってくれてありがとう。
私は亀が好きだ。
Reply
#42
I went ahead and took the liberty of separating the "Word" field in to separate fields yesterday (kanji, romaji, pos1, meaning1, pos2, meaning2, pos3, meaning3). Since I don't have any programming knowledge whatsoever it meant blowing a day fiddling around in Excel splitting/merging columns, but I think I managed to avoid screwing up anything major... hopefully. I was too burned out when I was finished to really thoroughly check it and now I don't want to.

http://www.mediafire.com/download/y57n19...e5000.apkg

No idea if this can be merged into the original deck since I tweaked some stuff to make the text exports/imports I was doing easier and split a few cards with multiple words on them into separate ones (so there are 5006 cards now), but hopefully somebody finds it useful. But even if you don't I only did it for myself anyway, so that's okay.
Reply
#43
Thanks for the deck and to your wife for doing great audio. I've only just started it and I'm gonna work through it and core 6k at the same time.. (Bad idea?)
Reply
#44
I am sure others will find it useful jmignot.

I am currently working on a Core5000 v2.0 which should be completed in the next few weeks.

Basically getting my sister in law to help add kanji in places where there is supposed to be kanji. And just giving it another check and making it more consistent.

For some reason the book likes to use hiragana in some places and the kanji for the same word in other places.
Reply
#45
kraemder Wrote:Thanks for the deck and to your wife for doing great audio. I've only just started it and I'm gonna work through it and core 6k at the same time.. (Bad idea?)
I wouldn't say bad idea as much as I would say bad investment of time. If I was you I would just do one or the other.
Reply
#46
Nayr182 Wrote:For some reason the book likes to use hiragana in some places and the kanji for the same word in other places.
Is that for general vocab? Or is it things like "to iu ~" which is always written in hiragana when it functions more like a grammatical particle than a verb.
Reply
#47
ktcgx Wrote:
Nayr182 Wrote:For some reason the book likes to use hiragana in some places and the kanji for the same word in other places.
Is that for general vocab? Or is it things like "to iu ~" which is always written in hiragana when it functions more like a grammatical particle than a verb.
For example sometimes oishii will be written 美味しい and then other times will be written おいしい. まで instead of 迄、ごろ instead of 頃. Etc.


I want to assure everyone that these changes aren't just made willy nilly.

I have two native Japanese people go through each card one by one, making sure everything is written the way a normal every day Japanese person would write it.
Reply
#48
Sorry, I didn't mean to come across as doubting your methods or ability Wink

I think the use of kanji or not, could be down to the stylistic choices of the author of the text which the people who compiled the frequency dictionary took for their example sentences.
Edited: 2014-11-02, 10:12 pm
Reply
#49
Nayr182 Wrote:
kraemder Wrote:Thanks for the deck and to your wife for doing great audio. I've only just started it and I'm gonna work through it and core 6k at the same time.. (Bad idea?)
I wouldn't say bad idea as much as I would say bad investment of time. If I was you I would just do one or the other.
You made a good point about the time investment. I'm kind of bored of the core 6k anyway - I've started and gotten distracted doing it so many times. And from what little I've seen of this deck the sentences are in fact longer, allowing for more complex grammar, which would let me kill two birds with one stone. I think I'll just switch to this deck instead.

Will version 2.0 be an update to this deck I can download without it ruining my stats or will I be starting over?
Reply
#50
kraemder Wrote:
Nayr182 Wrote:
kraemder Wrote:Thanks for the deck and to your wife for doing great audio. I've only just started it and I'm gonna work through it and core 6k at the same time.. (Bad idea?)
I wouldn't say bad idea as much as I would say bad investment of time. If I was you I would just do one or the other.
You made a good point about the time investment. I'm kind of bored of the core 6k anyway - I've started and gotten distracted doing it so many times. And from what little I've seen of this deck the sentences are in fact longer, allowing for more complex grammar, which would let me kill two birds with one stone. I think I'll just switch to this deck instead.

Will version 2.0 be an update to this deck I can download without it ruining my stats or will I be starting over?
If possible (unless I stuff something up) I will try to make it so you can just download the deck again and update it.
Reply