Back

Sort index difference between Nukemarine's Kore and Core_2k6k10K_Further_Optimized

#1
Hello,

After finishing RTK last year around May (see my post for Anki stats here), I continued studying, and the natural way was learning vocab through sentences. So I got my hands on a sentence deck, which I believe is the last version of the famous Core10k deck that was taken down from Ankiweb: the Core_2k6k10K_Further_Optimized_PICSOUNDPITCH_ACCENT_v23 version (download found on this Reddit thread).
I resorted it by repositioning the cards as mentioned, by the Optimized-Voc-Index2k+4kDefault index.
Anectodal remark: I believe this deck originates from the one in this thread by pmnox, but it doesn't seem to go after v18 in that thread.

Ok, so I was ready and I studied about 200 cards, but then had to stop about a month after, a year ago due to less time available. Now I have time again, and was looking at starting back to where I was.

The thing is, out of curiosity, I happened to look around and realized that there was actually another deck to learn vocab through sentences with audio and images ( = what I'm interested in): Nukemarine's Kore deck (well from what I understand it's really a spreadsheet, and you have to import it as a deck, and then make a template, fair enough, I can do that).

Now the questions: Nukemarine's Kore seems to be sorted by default by the New-Opt-Voc-Index, which is a different order than the Optimized-Voc-Index2k+4kDefault index. For instance the first word is それ and not 一つ anymore.

1) What is the difference between these orders?
As I understand, they are all based on a frequency list (KO2001), which have then been refined by the i+1 method. This method makes appear only one new Kanji per new card, and ensures that Kanji that are used in like one card, are then grouped to other cards, like 一般 which is 3rd in the Optimized-Voc-Index2k+4kDefault index (as it comes after 一 which is 2nd), but it is number 1185(!) in the Kore deck (according to New-Opt-Voc-Index). So there seems to be big differences between these decks' orders, and I would like to have the most optimal order to start learning.

2) There seems to be many other different sorting indexes in the Kore deck, and I would like to know what those three do: Sent-KO-Index, Opt-Voc-Index, Opt-Sen-Index?
- I assume [i]Opt-Voc-Index[/i] is an older version of New-Opt-Voc-Index (duh!) but what exactly changed?
- From this post I found that
Quote:"Opt-Sen-Index [...] sorts the sentences to have the least amount of words you are seeing for the first time (aka n+1)"
What does that means exactly, and especially how is it different from i+1, and which is the best to learn vocab through sentences? (I'm not sure what is the purpose of learning just sentences).
Thanks a lot in advance.
Reply
#2
The sorted Kore 2k/6k is a bit out of date. I've updated the order and posted the resources online in the thread here. I sorted it basically like I listed in the thread you linked. Some people sorted based on RTK index. I used the 2001.Kanji.Odyssey list as that kanji list which was horrible to learn kanji in bulk seemed to work fine when used to sort vocabulary lists.

By the way, that spreadsheet I posted has a number of sorting index. Here's what they do:
  • Core Index - Original order from iKnow's website.
  • 2k1-Kanken Opt Sort - The order I use. I used groups of 2000 words (Core 2k, then 4k, then 6k, etc). There's additional sorting (first 1000 words only use 555 kanji, next 1000 words are 1110 kanji, next 1000 use 1550, etc.). 
  • Freq Opt Sort - In this, it's grouped by 1000 words by frequency then sorted using 2k1/RTK/Kanken index. 
  • 2k1-Kanken-Index - This was the output of Cangy's sorting program using the 2k1/RTK/Kanken as the sorting index.

Personally, either opt sort list is fine. I set up the first one because I was teaching 1000 words limited to the 555 kanji I had people learn in my course. After that, I added another 555 kanji so used the total 1110 to limit what words appeared in the list. If you already know lots of kanji, the Freq Opt Sort might be more fun.
Reply
#3
Thanks a lot Nukemarine, I could not hope for better than an answer from the deck's creator Smile
(Also thanks for all our efforts and work towards the community in general, it's been 5 years I started looking around for resources to learn Japanese, and your name always ends up showing up!)

I would like more details on a few things:

So it seems the Kore deck from the spreadsheet I had posted is outdated, and there is an updated one in the thread you mention, great!

1) First, I have a question that I  should have asked before: this new deck, the old Kore, and the 'original' Core10k (the one I have), all these 3 have the same core material right? By that I mean the now classical vocab/sentences/audio/images from Smart.fm? It's just the default order (the object of my question) that differs, and then a few extra column like pitch, some other ordering etc?

2) In your new deck (I'm assuming you mean this one, right?) I'm still not very sure about the default ordering you used. You say
Quote:I sorted it basically like I listed in the thread you linked
are you talking about the pmnox thread I had linked?
I am a little confused because you say that you sorted it using KO2001, but the one I linked was also sorted this way, so there is no basic difference.
As I said, and this was the subject of my questions, I believe the Core10k default ordering, which seems to me based on Cangy's Kore deck, and your deck's default ordering, are all based on a frequency list (here KO2001) plus the i+1 method, no?
So my question is why is the output different then (given that it's the same list in input and the same method used)?

3) You say either 'opt sort' list is fine, so you mean either the New opt sort index or the Old opt sort index are fine, since those two are the only 'opt sort' I can find in your spreadsheet?
I guess that now that I see there are even more indexes that all pretend to be optimal at something I'm even more confused than before.
Here is what I see that in the spreadsheet, could you expand on how these were made?:
  • Core index: you have explained it
  • New-Opt-Sort: not sure what that is, but it is sorted by default to that, so maybe it's what you call 2k1-Kanken Opt Sort?
  • 2k1 Kanken index: probably what you descibed in your fourth bullet?
  • Old Core index: not sure what that is
  • Opt 2k Index: not sure what that is
  • Opt-index: not sure what that is
  • Old-Opt-sort: not sure what that is
It would be nice to have the differences between all these.

4) What is the difference between all these and the Optimized-Voc-Index2k+4kDefault index of my Core10k deck?

If it wasn't obvious I'm trying to decide if I continue using my deck, or to switch to the new one. And I am sure new users would be happy to know how all these sorting options were made, and to what they correspond, to be able to chose what to do.
This information, unless it's already out there, could be added to one of your threads in first post somewhere.
Reply
MONSTER Sale Get 28% OFF Basic, Premium & Premium PLUS! (Oct 16 - 27)
JapanesePod101
#4
(2017-10-02, 10:42 am)penpex Wrote: Thanks a lot Nukemarine, I could not hope for better than an answer from the deck's creator Smile
(Also thanks for all our efforts and work towards the community in general, it's been 5 years I started looking around for resources to learn Japanese, and your name always ends up showing up!)

I would like more details on a few things:

* snip *

I updated the Core 2k/6k/10k spreadsheet so check it out. It has the columns I discussed. To your questions:

1. Yes, all these are basically a resorting of original smart.fm/iKnow data. Some additional information is added and some resources added more images to Core 6k/10k sentences.

2. I don't have any updated decks created. Any decks that I did make years ago are somewhat out of date but likely usable. Pretty much anything I talk about is with the spreadsheet that can be used to make one's own deck. 

3. The opt sort I mention will be in the spreadsheet I just updated before replying. There's two sorting index (opt frequency sort, 2k1/kanken opt sort). If you already know RTK, just stick to the opt freq sort. If you haven't studied kanji yet, use the 2k1/kanken with the "Learn 555 in book 1 of 2001.Kanji.Odyssey in RTK order" followed by first 1000 words. Then "Next 555 kanji in book 2 of 2001.Kanji.Odyssey in RTK order" followed by another 1000 words. 

4. I don't know how opt core 2k/4k index is sorted. There's a few decks that sort using RTK order and others that sort by sentences. 

I get that there's a lot of material out there. Most are just different flavors of existing Core 2k/6k items. Almost all are fine and get you to the same destination. I just set up the decks on the idea of learning kanji/grammar/vocabulary in a more structured and optimized manner. There are no decks as I opted to use Memrise for a variety of reasons such as updates I make being seen immediately by all which is not the case with Anki.
Edited: 2017-10-02, 10:53 pm
Reply
#5
Thanks a lot again for your replies.

Quote:I get that there's a lot of material out there. Most are just different flavors of existing Core 2k/6k items. Almost all are fine and get you to the same destination. I just set up the decks on the idea of learning kanji/grammar/vocabulary in a more structured and optimized manner. There are no decks as I opted to use Memrise for a variety of reasons such as updates I make being seen immediately by all which is not the case with Anki.
I understand that for sure, and having almost fallen myself into the pitfall of spending more time optimizing than actually learning I get it. But I think that spending a little time reseraching and optimizing at first is greatly beneficial, especially on large scale learning like 6000-10000 cards. At this point even reviews intervals/leeches and the like are important, but that's another story.

The Memrise route is a respectable choice for sure, but since I know  Anki, I see no personal advantage to changing to Memrise so far (is there in my case?), so I'll stick with the decks.


So optimization is why there is still one question which comes back in my posts but that you never addressed specifically so far: can you detail how you did your sorting exactly for the 2k1-Kanken Opt Sort? (like I could reproduce it if I wanted to).

You said at the beginning

Quote:2k1-Kanken Opt Sort - The order I use. I used groups of 2000 words (Core 2k, then 4k, then 6k, etc).
but I it's not really clear to me to be honest.
Also

Quote:2. I don't have any updated decks created. Any decks that I did make years ago are somewhat out of date but likely usable. Pretty much anything I talk about is with the spreadsheet that can be used to make one's own deck.
I meant the spreadsheet of course.

So what I would like to know is, for this sorting, did you take the Smart.fm database, then apply a frequency sorting based on some list you found (KO2001), and then a i+1 method to make this ordering? Or is there anything else?
Links to the actual lists and tools would be a plus of course Smile

Again, this information may be scattered somewhere already, but in any case I think it'd be useful to more people than just me Smile
Reply
#6
Quote: So optimization is why there is still one question which comes back in my posts but that you never addressed specifically so far: can you detail how you did your sorting exactly for the 2k1-Kanken Opt Sort? (like I could reproduce it if I wanted to).
Ok. First I put together the list of kanji in order. These were the 2001 kanji from the 2001.Kanji.Odyssey book series, then the remaining kanji from RTK books 1 & 3. The remaining were about 3500 or so kanji from the Kanken list. You can see such a sorted order from this spreadsheet.
Next, I used that list in a text file as the order index and a file of entire Core 2k/6k/10k vocabulary list and ran those through Cangy's vocabulary sorting program. This program uses the order index and arranges vocabulary by words that use the lowest ranked kanji (less frequently used most likely).The output just adds the index that I list as the 2k1-Kanken index. Of note is kana only words that have an index of 0.
Next, I put this index in the Core 2k/6k/10k spreadsheet, created a "group index" where we have the Core 2k/6k/10k split in groups of 1000 words. I sort by group, then by 2k1-kanken index. I create a new column "temp opt sort". I spread out the kana only words from 1 to 1000 (ex: if 50 kana only words, they have 1, 21, 41 ... 981 counting). I do this for all 10 groups for kana and make note how many kana only words in each group.  
Now comes the tedious part. For the first 1000 words (group 1), I only want to use 2k1-kanken index 1 - 555 so I sort 2k1-kanken, highlight words under 555, then sort those by groups. After that, I first (1000 - group 1 kana count) words, spread them out between 1 to 1000 in "temp opt sort". For second 1000 words (group 2), I do it again but this time use index 1 - 1110. For third 1000 words (group 3), I use index 1 - 1550. For group 4 it's 1 - 2001. After that, all 2k1-kanken indexes are used. I then sort the "temp-opt-sort" which blends the kana and kanji words, create a column "2k1-kanken Opt Sort" and number those 1 to 10,000. Done.
For frequency opt sorting, it's much, much easier. I still use groups of 1000. Those groups are then sorted by 2k1-kanken index with kana only words keeping their original position.
Reply