kanji koohii FORUM
Core 10k - optimized i+1 version - Printable Version

+- kanji koohii FORUM (http://forum.koohii.com)
+-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html)
+--- Forum: Learning resources (http://forum.koohii.com/forum-9.html)
+--- Thread: Core 10k - optimized i+1 version (/thread-11095.html)

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14


Core 10k - optimized i+1 version - pmnox - 2013-08-27

Hmm. I don't know how to match records from core-6000.txt with ones from "Core 2k/6k Optimized Japanese Vocabulary".

The core indexes differ and I was unable to find a way to do 1-1 matching. There are a lot of mistakes in "Core 2k/6k Optimized Japanese Vocabulary", which makes it harder to do a 1-1 match.

The algorithm that I wrote find correct matches for 5950 entries. I guess I'll have to manually fix the rest.


Core 10k - optimized i+1 version - pmnox - 2013-08-27

I was only able to match 5810/5999 sentences. I guess it's not worth the effort to get to fix the remaining 200 as it's just 3% of the deck.

5810 is the number of sentences when both the reading of the sentence and the reading of the word matches. Of course after doing lot's prepossessing. I'm going to compare those decks side by side to see if it's worth updating.


Core 10k - optimized i+1 version - pmnox - 2013-08-27

Out of those 5810 sentences only 180 English keywords or English sentences are different. So It's not worth the effort. Out of those 180 most of the changes are minor.

I guess, the only thing left to do is adding the Japanese pitch accent.


Core 10k - optimized i+1 version - pmnox - 2013-08-27

I uploaded an updated version. It has all necessary indexes.
https://ankiweb.net/shared/info/702754122

The only thing that is missing is the japanese pitch accent.


Core 10k - optimized i+1 version - pmnox - 2013-08-27

nothing here


Core 10k - optimized i+1 version - NinKenDo - 2013-08-27

So is this the final version besides the addition of pitch accents?


Core 10k - optimized i+1 version - pmnox - 2013-08-27

NinKenDo Wrote:So is this the final version besides the addition of pitch accents?
Yes, it is more like a release candidate. It contains all the necesary features. I'm waiting for feedback. As it is possible that there may be still some unexpected bugs.


I just remembered one minor issue that I'll fix tomorrow.


Core 10k - optimized i+1 version - killua - 2013-08-28

EDIT:
I guess I'm the only one who is obsessed with that new version. Big Grin

Don't worry, I'll do the matching by myself, you already did a great job.


Core 10k - optimized i+1 version - pmnox - 2013-08-28

killua Wrote:EDIT:
I guess I'm the only one who is obsessed with that new version. Big Grin

Don't worry, I'll do the matching by myself, you already did a great job.
One of the issues that I wanted to solve was to provide an universal core-index that would stay the same to make it easier to match the records. I want to avoid changing the order of columns from now on just to make it easier to do the matching.

I also made some small changes to the sorting algorithm.

Here is the v3 version:
https://ankiweb.net/shared/info/702754122


Core 10k - optimized i+1 version - lauri_ranta - 2013-08-28

I just realized that the TSV file that core-6000.txt was based on (iknow_core6k_complete.tsv, made by Savii) was edited by Savii to improve some of the translations. It also has some other changes compared to the original JSON files extracted from iKnow's website. See http://forum.koohii.com/showthread.php?pid=185115#pid185115.

I only made a few changes in core-6000.txt, so it might be better to use the files from the thread linked above as a source.


Core 10k - optimized i+1 version - pmnox - 2013-08-28

Hmm. Maybe I should include japanese definition of the word as well. What do you think?


Core 10k - optimized i+1 version - pmnox - 2013-08-28

Hmm. Maybe I should include japanese definition of the word as well. What do you think?

Btw, I was able to match pitch accent for 5619 out of 5999 cards.
I couldn't find accent for 272 cards, and for 108 cards the kanji matched, but the reading didn't match so I decided not to use pitch accent for these.

I'll do the same for last 4k cards as well.


Core 10k - optimized i+1 version - pmnox - 2013-08-28

lauri_ranta Wrote:I just realized that the TSV file I used as the source for core-6000.txt had many differences compared to the original JSON files extracted from iKnow's website. See http://forum.koohii.com/showthread.php?pid=185115#pid185115.
If most people are in favor of using new core-6000.txt file. Then I'll rebuild the database from it. The only issue is lack of sound files for about 50 new words and about 100 sentences. Should I just leave those empty?

Most of changes are minor like extra dots, commas, etc.

So I would like to start a vote on whenever to rebuild the database from core-6000.txt or not.

EDIT:
I found audio files for core-6000.txt deck. So I guess there is no reason not to rebuild it.
In that case I'll rebuild the whole deck in the next few days using core-6000.txt.


Core 10k - optimized i+1 version - pmnox - 2013-08-28

I just posted v4 version that was completely rebuild using core-6000.txt deck.
https://ankiweb.net/shared/info/702754122

It should be up in about 15 minutes.

EDIT:
It's up.


Core 10k - optimized i+1 version - ryuudou - 2013-08-28

When it comes to Core2k/6k can you summarize the changes for this deck compared to Nukemarine's deck? Sorry.

To my understanding you did what he did, and then tweaked his order for the vocabulary that have one rare kanji?


Core 10k - optimized i+1 version - killua - 2013-08-28

You are my hero, pmnox!

Quote:I found audio files for core-6000.txt deck.
The link was on the first line of the file. Smile


Core 10k - optimized i+1 version - pmnox - 2013-08-28

killua Wrote:You are my hero, pmnox!

Quote:I found audio files for core-6000.txt deck.
The link was on the first line of the file. Smile
Thanks.

How is your progress through core deck?

Btw. version 05 is up. I noticed that pictures and audio didn't match after the update.


Core 10k - optimized i+1 version - killua - 2013-08-28

pmnox Wrote:How is your progress through core deck?
I'm going to finish the first 2k in a couple of days.
I'm doing recognition only.


Core 10k - optimized i+1 version - pmnox - 2013-08-28

killua Wrote:
pmnox Wrote:How is your progress through core deck?
I'm going to finish the first 2k in a couple of days.
I'm doing recognition only.
What do you think about doing a hardcore 10k index? You are certainly ambitious. xD


Core 10k - optimized i+1 version - killua - 2013-08-28

pmnox Wrote:What do you think about doing a hardcore 10k index? You are certainly ambitious. xD
Do you mean no frequency groups at all? All 10k sorted as a whole?


Core 10k - optimized i+1 version - pmnox - 2013-08-28

Yes, exactly.


Core 10k - optimized i+1 version - killua - 2013-08-28

I think it's possible, but not worth it.

You are basically delaying the jump into native material for months, while gaining very little in terms of learning efficiency.


Core 10k - optimized i+1 version - Xanpakuto - 2013-08-28

killua Wrote:
pmnox Wrote:How is your progress through core deck?
I'm going to finish the first 2k in a couple of days.
I'm doing recognition only.
Geez, I'm doing a hundred words a day and I think thats a lot. Core 2k in a couple days, isn't anki going to explode with reviews?


Core 10k - optimized i+1 version - Vempele - 2013-08-28

It doesn't explode from adding 1700 cards at once, Core2k is barely bigger than that. Smile


Core 10k - optimized i+1 version - killua - 2013-08-28

I think you got it wrong. I started a little less than two weeks ago and I'm gonna do the last ~300 in a couple of days! Big Grin