kanji koohii FORUM
Core 10k - optimized i+1 version - Printable Version

+- kanji koohii FORUM (http://forum.koohii.com)
+-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html)
+--- Forum: Learning resources (http://forum.koohii.com/forum-9.html)
+--- Thread: Core 10k - optimized i+1 version (/thread-11095.html)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14


Core 10k - optimized i+1 version - pmnox - 2013-08-20

I have extracted the list of all new words with new reading that appear in Core 10k and are not yet in Core 6k. I'm trying to figure out how to sort the list of new words with optimized order for learning.

I've also changed the new cards from core10K from recognition type of cards to production.

I read that Core 2k/6k Optimized Japanese Vocabulary was sorted using Cangy's program via 2001.Kanji.Odyssey kanji order.
Does anyone know where I can find Cangy's program or 2001 Kanji Oddysej kanji order?


Btw, I've almost finished the Core 2k. It took me two weeks to get through the first 1500 kanjis. So, it's going to take me at least 2 months to finish Core 6k, but I would like to prepare the material ahead.



Here is the V18 version of the whole core 10k optimized i+1 sound+pictures.
https://ankiweb.net/shared/info/702754122

Image and audio files are provided separately:
for cards 0k-10k: https://ankiweb.net/shared/info/959273281
for cards 0k-6k: https://ankiweb.net/shared/info/1692722392
for cards 6k-10k: https://ankiweb.net/shared/info/169008752

Here is the supplement, an additional set of 15 cards that are in core 10k taken from Tanuki-Ultima deck and CorePlus deck.
https://ankiweb.net/shared/info/1132075078

Here are txt files used to generate those decks:
https://www.dropbox.com/revisions/ios%20development/2013/mature/core6k_base17.txt
https://www.dropbox.com/revisions/ios%20development/2013/mature/core10k6k_base15.txt
https://www.dropbox.com/revisions/ios%20development/2013/mature/core14k_base2.txt
https://www.dropbox.com/revisions/ios%20development/2013/mature/coreplus_base7.txt

TO DO LIST:
any suggestions?


Core 10k - optimized i+1 version - zurisu - 2013-08-21

I have no idea how to help with the sorting, but ありがとう for working on this; I'm sure a lot of people will find such a deck extremely useful! (╹◡╹)


Core 10k - optimized i+1 version - ryanjmack - 2013-08-21

zurisu Wrote:I have no idea how to help with the sorting, but ありがとう for working on this; I'm sure a lot of people will find such a deck extremely useful! (╹◡╹)
I second this.


Core 10k - optimized i+1 version - pmnox - 2013-08-21

I can write the program to sort it myself. I just don't know where to find the 2001 Kanji Oddysej kanji order.

In the worst case I'll have to write an algorithm that recreates the order from Core 6k optimized i+1 deck, and then another one that does the sorting. I'll have to spread the words with no kanjis evenly just to avoid them being grouped together.


Core 10k - optimized i+1 version - RawToast - 2013-08-21

Cangy's stuff:

http://forum.koohii.com/showthread.php?tid=5091
https://sites.google.com/site/ankinihongo/

You may find some of cb4960's tools useful, you could add vocab frequency, etc.


Core 10k - optimized i+1 version - tashippy - 2013-08-21

Maybe you could ask Nukemarine, he's the hero who optimized the original decks.

Maybe this could this help: http://forum.koohii.com/showthread.php?tid=10932 I know it probably wouldn't be too helpful since you're already using Cangy for the first 6k, but it does avoid repeats.


Core 10k - optimized i+1 version - Splatted - 2013-08-21

You could also use morphology and optimise it as you go.


Core 10k - optimized i+1 version - ktcgx - 2013-08-21

I have a question, mostly related... What does it mean by "optimised"? I see it a lot on anki related threads here, but I don't really understand what is meant by it... Optimised for what?


Core 10k - optimized i+1 version - sholum - 2013-08-21

ktcgx Wrote:I have a question, mostly related... What does it mean by "optimised"? I see it a lot on anki related threads here, but I don't really understand what is meant by it... Optimised for what?
Usually it's optimized for use with KO2001 by sorting the words by kanji corresponding to the order in KO2001.

The main thing is just that it's grouped by kanji, which means that related words and readings will pop up close together, making it easier to internalize how that kanji is used in that group of words. It's much more efficient than the regular frequency list that you get with the Core decks.

In this case though, it looks like the OP wants to also optimize the sentences to be n+1 (or at least as close as possible). This means that each new sentence only has one word that you haven't seen yet. This isn't going to be the case with all of the sentences, but it helps avoid the problem of suddenly having an example sentence that you can't understand at all without looking at the translation.


Core 10k - optimized i+1 version - Nukemarine - 2013-08-22

What I did

1. Create the "optimized kanji" list (2k1, remaining RTK1 and RTK3 then rest of the 6000 kanji in my list)
2. Put the Core 10k list to sort with Cangy's program into a text file
3. Sort the vocabulary list using Cangy's program getting a Sort Index
4. Copy that sort index into a spread sheet with indexes for Core 10k order.
5. I then spread the kana only words by finding the ratio of kanji to kana, I have a temp column putting multiples of that ratio beside the kana words, the do a number number count by the kanji words. When I now sort by the temp column, the kana words are even spaced.
6. Create a permanent index called "Opt-Vocab-Sort", number the entries 6001 to 10,000 for use with Anki later.

Probably sounds complicated, but it makes sense. The kanji sorting order that really matters is the 2k1. I added in the remaining RTK1 and 3 kanji in Heisig order and finally the rest of the 6,000 kanji from a large Kanji spreadsheet that's been around for years. All I had to do was paste that list into a text file for Cangy's sorting program to reference it.

I only sort vocabulary words as I found that sorted sentences created 10+ sentences that are only grouped because a simple word like 彼 popped up finally. I also spread out the kana words as it got boring learning them as the only way they're sorted is by kana and no connective meaning to them. If you learned in the original 10k order, you'll understand how tedious this gets learning in dictionary order.

Personally, I like the idea of vocabulary words grouped in bundles of 1,000 by frequency of use in literature. Those bundles are then sorted by the optimized kanji list with kana only spread out evenly. However, Core 2k/6k/10k offers in way of voice acted sample sentences so it's a decent trade off.


Core 10k - optimized i+1 version - RawToast - 2013-08-22

Nukemarine Wrote:Personally, I like the idea of vocabulary words grouped in bundles of 1,000 by frequency of use in literature. Those bundles are then sorted by the optimized kanji list with kana only spread out evenly. However, Core 2k/6k/10k offers in way of voice acted sample sentences so it's a decent trade off.
I considered doing this a few months back using the frequency lists created by cb4960's text analyser on the innocent novels.

Instead I ended up just adding high frequency words with audio that I find with Rikaisama.


Core 10k - optimized i+1 version - ryuudou - 2013-08-22

sholum Wrote:
ktcgx Wrote:I have a question, mostly related... What does it mean by "optimised"? I see it a lot on anki related threads here, but I don't really understand what is meant by it... Optimised for what?
Usually it's optimized for use with KO2001 by sorting the words by kanji corresponding to the order in KO2001.

The main thing is just that it's grouped by kanji, which means that related words and readings will pop up close together, making it easier to internalize how that kanji is used in that group of words. It's much more efficient than the regular frequency list that you get with the Core decks.
KO2001 is a frequency list though.


Core 10k - optimized i+1 version - sholum - 2013-08-22

ryuudou Wrote:
sholum Wrote:
ktcgx Wrote:I have a question, mostly related... What does it mean by "optimised"? I see it a lot on anki related threads here, but I don't really understand what is meant by it... Optimised for what?
Usually it's optimized for use with KO2001 by sorting the words by kanji corresponding to the order in KO2001.

The main thing is just that it's grouped by kanji, which means that related words and readings will pop up close together, making it easier to internalize how that kanji is used in that group of words. It's much more efficient than the regular frequency list that you get with the Core decks.
KO2001 is a frequency list though.
Same result. I meant that the vocabulary isn't done by frequency. While going through the optimized Core deck, you'll notice that you're getting words from near the end pretty early on simply because of the kanji in it.
So even if it's by kanji frequency, it's not by word frequency.


Core 10k - optimized i+1 version - ryuudou - 2013-08-23

sholum Wrote:
ryuudou Wrote:
sholum Wrote:Usually it's optimized for use with KO2001 by sorting the words by kanji corresponding to the order in KO2001.

The main thing is just that it's grouped by kanji, which means that related words and readings will pop up close together, making it easier to internalize how that kanji is used in that group of words. It's much more efficient than the regular frequency list that you get with the Core decks.
KO2001 is a frequency list though.
Same result. I meant that the vocabulary isn't done by frequency. While going through the optimized Core deck, you'll notice that you're getting words from near the end pretty early on simply because of the kanji in it.
So even if it's by kanji frequency, it's not by word frequency.
What does kanji frequency have to do with grouping?


Core 10k - optimized i+1 version - pmnox - 2013-08-23

Nukemarine Wrote:What I did

1. Create the "optimized kanji" list (2k1, remaining RTK1 and RTK3 then rest of the 6000 kanji in my list)
2. Put the Core 10k list to sort with Cangy's program into a text file
3. Sort the vocabulary list using Cangy's program getting a Sort Index
4. Copy that sort index into a spread sheet with indexes for Core 10k order.
5. I then spread the kana only words by finding the ratio of kanji to kana, I have a temp column putting multiples of that ratio beside the kana words, the do a number number count by the kanji words. When I now sort by the temp column, the kana words are even spaced.
6. Create a permanent index called "Opt-Vocab-Sort", number the entries 6001 to 10,000 for use with Anki later.

Probably sounds complicated, but it makes sense. The kanji sorting order that really matters is the 2k1. I added in the remaining RTK1 and 3 kanji in Heisig order and finally the rest of the 6,000 kanji from a large Kanji spreadsheet that's been around for years. All I had to do was paste that list into a text file for Cangy's sorting program to reference it.

I only sort vocabulary words as I found that sorted sentences created 10+ sentences that are only grouped because a simple word like 彼 popped up finally. I also spread out the kana words as it got boring learning them as the only way they're sorted is by kana and no connective meaning to them. If you learned in the original 10k order, you'll understand how tedious this gets learning in dictionary order.

Personally, I like the idea of vocabulary words grouped in bundles of 1,000 by frequency of use in literature. Those bundles are then sorted by the optimized kanji list with kana only spread out evenly. However, Core 2k/6k/10k offers in way of voice acted sample sentences so it's a decent trade off.
Could you publish the indexed i+1 Core 2k/6k/10k deck or at the very least the "optimized kanji list" ?


Core 10k - optimized i+1 version - sholum - 2013-08-24

ryuudou Wrote:
sholum Wrote:
ryuudou Wrote:KO2001 is a frequency list though.
Same result. I meant that the vocabulary isn't done by frequency. While going through the optimized Core deck, you'll notice that you're getting words from near the end pretty early on simply because of the kanji in it.
So even if it's by kanji frequency, it's not by word frequency.
What does kanji frequency have to do with grouping?
The Optimized Core decks use KO2001 to determine the order in which they show up. I don't remember exactly what other tweaks were made to it, but for the most part, words that use the same kanji show up near each other assuming you don't mess with the new card order. This creates a grouping of similar words in many cases.
I can't tell if you really don't understand what I'm trying to say (possible communication errors on my part) or if you're just doing one of those acts where you try to argue against my statement without actually arguing. If it's the latter, please stop; it's annoying. Contest my statements if you want, I don't care.
If it's the former, please clarify your question. If I'm not answering your question properly, I don't understand it.


Core 10k - optimized i+1 version - ryuudou - 2013-08-24

How can vocab be grouped by similar kanji if the vocab are ordered by kanji frequency? If the vocab are being ordered by kanji frequency then similar kanji will not be in groups, and the words will not come in order of frequency.

Unless you don't mean ordered but grouped as in going through a group of frequent words not by order of frequency.


Core 10k - optimized i+1 version - Vempele - 2013-08-24

ryuudou Wrote:How can vocab be grouped by similar kanji if the vocab are ordered by kanji frequency?
No one said anything about similar kanji.


Core 10k - optimized i+1 version - ryuudou - 2013-08-24

Vempele Wrote:
ryuudou Wrote:How can vocab be grouped by similar kanji if the vocab are ordered by kanji frequency?
No one said anything about similar kanji.
sholum Wrote:words that use the same kanji show up near each other



Core 10k - optimized i+1 version - Vempele - 2013-08-24

Same is not the same as similar.


Core 10k - optimized i+1 version - pmnox - 2013-08-24

Btw, I'm close to finishing core 2k. So, I added missing pictures to all cards in core 6k just to make the learning process less boring. Should I publish my core 6k deck with added pictures, or wait for the core 10k to be updated and then add version containing all the pictures?


Core 10k - optimized i+1 version - ryuudou - 2013-08-24

Oh. I misinterpreted.


Core 10k - optimized i+1 version - pmnox - 2013-08-25

Nukemarine Wrote:What I did

1. Create the "optimized kanji" list (2k1, remaining RTK1 and RTK3 then rest of the 6000 kanji in my list)
2. Put the Core 10k list to sort with Cangy's program into a text file
3. Sort the vocabulary list using Cangy's program getting a Sort Index
4. Copy that sort index into a spread sheet with indexes for Core 10k order.
5. I then spread the kana only words by finding the ratio of kanji to kana, I have a temp column putting multiples of that ratio beside the kana words, the do a number number count by the kanji words. When I now sort by the temp column, the kana words are even spaced.
6. Create a permanent index called "Opt-Vocab-Sort", number the entries 6001 to 10,000 for use with Anki later.

Probably sounds complicated, but it makes sense. The kanji sorting order that really matters is the 2k1. I added in the remaining RTK1 and 3 kanji in Heisig order and finally the rest of the 6,000 kanji from a large Kanji spreadsheet that's been around for years. All I had to do was paste that list into a text file for Cangy's sorting program to reference it.

I only sort vocabulary words as I found that sorted sentences created 10+ sentences that are only grouped because a simple word like 彼 popped up finally. I also spread out the kana words as it got boring learning them as the only way they're sorted is by kana and no connective meaning to them. If you learned in the original 10k order, you'll understand how tedious this gets learning in dictionary order.

Personally, I like the idea of vocabulary words grouped in bundles of 1,000 by frequency of use in literature. Those bundles are then sorted by the optimized kanji list with kana only spread out evenly. However, Core 2k/6k/10k offers in way of voice acted sample sentences so it's a decent trade off.
There are some issues with the ordering that you used. Words that contain one rare kanji would be grouped near the end of the set, which makes learning harder.

For example:
返却
老人
先輩
etc.

Those words would appear alone at the end. While it is possible to just put those words near most common kanjis 返却 ->返, 老人 -> 人, 先輩 ->先.
So what I'm going to do is to sort expressions based on the kanjis maximum index of kanji in it that appears at least twice in the text.

As the result words that contain only Kana or Kanjis that appear only once would be spread evenly across the deck. And words that contain mix of common and rare kanjis would be places near in the list near the common kanjis that are contained within them.

This way of sorting solves the issue of grouping rare kanjis at the end of each block.


Core 10k - optimized i+1 version - tashippy - 2013-08-25

Nukemarine Wrote:Personally, I like the idea of vocabulary words grouped in bundles of 1,000 by frequency of use in literature. Those bundles are then sorted by the optimized kanji list with kana only spread out evenly. However, Core 2k/6k/10k offers in way of voice acted sample sentences so it's a decent trade off.
This is exactly my feeling. I tried creating personal decks with words I saw in literature I was reading, but the words weren't sticking the way the optimized order deck of newspaper words with audio for the sample sentences were. I'm not finished with Coke6K yet.
@pmnox, either way can't hurt. Personally, I will finish with the 6k deck I have but if you put up some great 6-10 deck when I get there I'll certainly appreciate that.


Core 10k - optimized i+1 version - pmnox - 2013-08-25

I have prepared the deck, but I'm planning to make some improvements.
So far I added pictures, and the sorted index based on list of all 3824 words, and the field that the sentence without the given word.

Here is the beta version of core10k - core6k deck: https://ankiweb.net/shared/info/163007112
Here is the beta version of core6k - core2k deck:
https://ankiweb.net/shared/info/274832392
Let me know if you have any suggestions.

I need to find the frequency list so that I can divide those words into groups of 1k,2k cards and then sort them as well. I'll do that later.

Cangy said that he will send me tools that he used before. I'm going to make some changes to those decks once I get them.