Back

kore: core2k+core6k complete, enhanced and sorted by kanji

#1
* smart.fm Japanese core 2000 and core 6000 vocab linked with corresponding sentences
* kana-only and furigana readings
* sentence dictionary lookup
* local and remote sound
* vocab, sentence and list ids and per list and overall indices
* sort indices for sorting on sentence-expression by kanji in rtkliteko2001+rtk+frequency, rtk+frequency, ko2001+frequency, and frequency orders
* highest kanji number and new required kanji progression for each sort order
* highlighting of vocab item within sentence and vocab parts-of-speech
* vocab JLPT level
* cloze deletion on vocab in sentences

https://sites.google.com/site/ankinihongo/

(edit: added JLPT, changed link, added cloze)
Edited: 2010-12-16, 12:30 am
Reply
#2
That's pretty cool. When you say 'sorted by kanji', what order are the kanji in? By grade level, or frequency or... ?
Reply
#3
Thanks for this. I've spent a few minutes setting up a deck and pruning the fields I don't need for this specific deck. And so for it looks like pure win. I'm loving the highlights and sentence lookup. Epic win, nice.
Reply
Thanksgiving Sale: 30% OFF Basic, Premium & Premium PLUS Subscriptions! (Nov 13 - 22)
JapanesePod101
#4
This looks wonderful, I just need to figure out how to use it. Whats the difference between
'ko2001-index' and 'ko2001-kanji' sorting? Both of the numbers are far above the index number included in the books. I'm having trouble figuring it out.
Reply
#5
I'm curious about those as well (fields 20-27).

I guess frequency index would probably be the most useful for most people?
Reply
#6
Cangy, thanks for the list. What I'll do once I return to the US this week is update the Google spreadsheet with this info. A big reason for this is the spreadsheet that's there now has a few of my "modifications" where I changed words to kanji that probably should have been left alone. It's ok for me, but that's something each person should decide on their own.
Reply
#7
Cangy, Just spotted this in the documentation on the wiki:

" by default sorted in rtkliteko2001+rtk+frequency order "

does this initial, unchanged import into anki give the nice curve as you point out here?

http://forum.koohii.com/showthread.php?p...8#pid77378

I'm working through it at the moment and seeing some differences in order between the super smooth sorted core 2k and this one, but that might be to it being the core 6k....
Reply
#8
What I do is group them into Core 2k (Step 1 and 2), Core 2k (Step 3-10), and Core 6k. After that, I sort them based on the order Cangy has.

This way you get beginner, beginner/intermediate, then intermediate sub-groups sorted by kanji. It also breaks up the large number of kana only sentences at the beginning.

PS: To sort Cangy's list, just sort the spreadsheet by List number. These follow the Core sequence fairly well. After that, just add a column saying what group the list number refers to (Core 2k Step 01, Core 6k step 12, etc.).
Reply
#9
When importing into Anki, does anyone have a good template?

I was considering something like what ReadTheKanji has...

Front:

Vocab Word
Sample Sentence

Back:

Vocab Word (with Furigana)
Sample Sentence (with Furigana)
Vocab Definition
Sample Sentence Translation

Possibly having the definition and translation hidden somehow.

Also maybe another card that has the audio on the front and the same back.

What I've found is that if I learn the meaning and pronunciation at the same time, and only worry about pronunciation, I'm not having any trouble with meaning.
Reply
#10
Nukemarine Wrote:What I do is group them into Core 2k (Step 1 and 2), Core 2k (Step 3-10), and Core 6k. After that, I sort them based on the order Cangy has.

This way you get beginner, beginner/intermediate, then intermediate sub-groups sorted by kanji. It also breaks up the large number of kana only sentences at the beginning.

PS: To sort Cangy's list, just sort the spreadsheet by List number. These follow the Core sequence fairly well. After that, just add a column saying what group the list number refers to (Core 2k Step 01, Core 6k step 12, etc.).
Sounds like it could be a winner. Thanks Nukemarine!
Reply
#11
the web page has a bit more detail than what I posted above, but it's still on the terse side and there's been a few questions, so I'll elaborate a bit more here

the index field is the original smart.fm order, 1-2000 for core 2000 and 2001-6000 for core 6000. the steps in core 2000 are 200 items each, so step 1 is 1-200, step 2 is 201-400 etc. the first 8 steps of core 600 have 250 items each, and the last 4 have 500 each. also the list field corresponds to a step, and the list-index field is the position within that list/step

it's then been sorted by kanji on the sentence-expression field -- sorting by kanji means that sentences containing higher kanji come after sentences containing only lower kanji, as you might expect (though the first sentence to contain a lower kanji will come later if it only appears with higher kanji)

the rtkliteko2001-index, rtk-index, ko2001-index and freq-index fields give the sort order, 1-6000, after sorting by kanji in rtkliteko2001+rtk+frequency, rtk+frequency, ko2001+frequency, and frequency orders

frequency order is the frequency of all 1650 kanji that occur in the sentences, from most frequent (彼 of course!) to least frequent

rtk is rtk1+3 order (2042+965 kanji), ko2001 is ko2001 order (2000 kanji), rtkliteko2001 is the first 1000 ko2001 kanji in rtk order with required primitives (1212 kanji)

rtkliteko2001+rtk+frequency means the rtkliteko2001 kanji, then the rtk kanji not already in rtkliteko2001, then the frequency kanji not already in rtkliteko2001 or rtk, in order

the current order is rtkliteko2001+rtk+frequency, and you can change the order just by doing a numeric sort on the appropriate index field

if you've finished rtk then you can choose any order. if you want to do the vocab/sentences while doing rtk or ko2001 or rtkliteko2001, then you can sort as appropriate

if you want the kanji introduced very gradually initially (but at a higher rate toward the end) you could try frequency order. if you want them introduced more evenly, then ko2001 order seems to be closer to maintaining a constant rate over the deck. I've added a graph to the web page which shows the effect of sorting on new kanji introduction

the rtk-kanji, ko2001-kanji, freq-kanji and rtkliteko2001-kanji fields each contain 2 parts, a kanji index number and a (possibly empty) list of kanji

the number is the index in the sort order of the highest kanji that occurs in that sentence. so if you have sorted by rtk-index then the rtk-kanji number is the frame number you need to be up to to do that sentence (but after 3007 it's 3007 plus the index in the frequency list...)

the list of kanji are the new kanji introduced in that sentence compared with the previous sentence. it'll be empty if the sentence doesn't introduce any new kanji, and it'll contain multiple kanji if one of the lower kanji only occurs with that higher kanji (so couldn't appear earlier)

so if you are learning ko2001 while going through the ko2001 sorted deck, then if you are up to the kanji corresponding to the number then you must have covered all the kanji in the sentences up to that point. but you've probably covered more (as some will occur only with higher kanji and so haven't appeared yet) so you could instead just learn them as they come up in the new kanji list

also, I've added a sample question and answer template to the web page for recognition cards using the xxfurigana plugin
Edited: 2010-03-02, 8:00 am
Reply
#12
cangy Wrote:the web page has a bit more detail than what I posted above, but it's still on the terse side and there's been a few questions, so I'll elaborate a bit more here
Awesome, that definitely clears things up for me. And the example card format is great, too!

Thanks for all your work on this!
Reply
#13
You are a legend! Thanks!
Reply
#14
This looks awesome, but could anyone elaborate on how to sort it? I have a feeling I'm overlooking something obvious, because I really don't get it.
Reply
#15
astendra Wrote:This looks awesome, but could anyone elaborate on how to sort it? I have a feeling I'm overlooking something obvious, because I really don't get it.
Open it in a spreadsheet program (as a CSV) and then use the program to sort on the column that you want.

The default sort is pretty good, but I chose 2001ko for mine as I liked the progression better.

Edit: Note that mine didn't save as utf8 properly the first time... I had to hit 'save as' and specifically tell it utf8 in OpenOffice.
Edited: 2010-03-02, 9:25 am
Reply
#16
Yeah, I don't really care for RTK sorting since I'm pretty much done with it, and the RTK kanji progression wasn't really made on a frequency-of-use basis.

But well, that clears things up. ありがとうございます!
Reply
#17
First, thank you cangy for this amazing resource. The resources this community shares is beyond compare. Anyway, is there any way to easily transfer progress between anki decks? I'd like to switch over to cangy's new core2k+6k sorting, but I'd rather not change the intervals on each card one-by-one. Any help will be greatly appreciated.
Reply
#18
Currently rerunning your kore script because i want the "unmunged" local filenames. When it's done, I'll post the file.

One first remark: if you're writing a shell script with bash syntax, use /bin/bash after the shebang!
Reply
#19
@kriskelvin: Wait 2 seconds, are you talking about downloading all the sound files so they play correctly?

If that's what you're talking about, I did that all last night. There's 11,970 files, so theoretically there's 15 duplicate items. But hey, 15 out of 6000 missing audio isn't that bad...

vileru Wrote:Anyway, is there any way to easily transfer progress between anki decks? I'd like to switch over to cangy's new core2k+6k sorting, but I'd rather not change the intervals on each card one-by-one. Any help will be greatly appreciated.
I'm not exactly sure how you're doing this, but what just did is consolidated all my decks into one big "Everything" deck. Each deck has its own tag, so I can just export the decks as separate Anki files again, and import them into the original decks, thus saving all the scheduling information.

Not 100% sure how this will work out with your implementation, but good luck
Edited: 2010-03-03, 11:00 pm
Reply
#20
This is great! Just wanted to post here and say thanks.

One suggestion: change the sort numbers from 1 2 3 4... to 0001 0002 0003 0004... If you do this you can sort better after you import into anki, in case you want to look at a one of the other sorting methods. I did this and re-imported because anki would sort them in this order: 1 10 100 1000 1001 1002 etc.
Reply
#21
makurabin Wrote:One suggestion: change the sort numbers from 1 2 3 4... to 0001 0002 0003 0004... If you do this you can sort better after you import into anki, in case you want to look at a one of the other sorting methods. I did this and re-imported because anki would sort them in this order: 1 10 100 1000 1001 1002 etc.
I always add an index to the files I import just so I'm sure that the ordering makes it into Anki. Instead of changing the sheet you can just select "Sort as numbers" in the Model Properties -> Fields - Options and it will fix that for you.

[edited for shocking typo]
Edited: 2010-03-04, 8:01 am
Reply
#22
This looks really awesome! I already imported the deck into Anki, but I have one problem left: The sound filenames from sentence-sound-local don't match with the files from the media packages.
Any idea what went wrong or how to fix that?
Reply
#23
Proxx Wrote:This looks really awesome! I already imported the deck into Anki, but I have one problem left: The sound filenames from sentence-sound-local don't match with the files from the media packages.
Any idea what went wrong or how to fix that?
That's odd. They matched for me just fine... I've made a deck and it's speaking to me just like I expect.

Maybe download from one of the other sources and see if that helps?
Reply
#24
@wccrawford:

Try this one: http://rs869.rapidshare.com/files/358951845/kore.zip

I rebuilt kore.txt with cangy's scripts - this file is untouched by anki, so the audio file names are be the ones from the media packages and the scanki downloader.

(Afterwards I ran anki's Check Media DB on the deck built from this file --> no probs)
Reply
#25
wccrawford Wrote:That's odd. They matched for me just fine... I've made a deck and it's speaking to me just like I expect.

Maybe download from one of the other sources and see if that helps?
Thank you. It seems that I just forgot the Core 2000 sound files... Rolleyes
Downloading right now... Smile

Edit: Yay it's working! Great! Smile
Edited: 2010-03-04, 5:45 pm
Reply