Back

Core 10k - optimized i+1 version

uisukii Wrote:I ran into syncing issues with the desktop client and it corrupted the database, forcing me to update from a backup collection. It happened a few times in a row. Maybe it's just something on my end.

Oh well, still really keen to sink my mind into to 6k- 10k section of this deck after I finish off the current nukemarine's Core 2k/6k optimized Japanese vocabulary deck. Looks too good to take a pass at.
You could try to drop some columns that you don't use. Maybe that will solve the issue as you won't have to transfer as much data.
Reply
Tried syncing through LTE and got much better results. It would sync 50-150 mb at a time instead of 4-20. It's all good now, but I'm wondering exactly how i+1 the default order is.

The first card was "internet," but the example sentence for it also contained several other words; none of which are used in any other sentence out of the initial 12 or so. Is the default order incorrect? Should I be sorting them by some field?

The main draw of this deck for me was the apparent i+1 optimization, but I'm not sure how to actually get the order to reflect that. My experience so far has been "maybe one thing I know plus 6 things I don't know," and it's been like that for practically every single card.
Edited: 2013-09-04, 9:16 am
Reply
Quote:The first card was "internet," but the example sentence for it also contained several other words; none of which are used in any other sentence out of the initial 12 or so. Is the default order incorrect? Should I be sorting them by some field?
The first card is "one (things)" here.
I enforced the "due" order with Browser -> Core06k -> Select All -> Reposition. I'm not sure why, but every time I import a new deck I have to do it.

Note that this is a vocabulary deck, so you should focus on the word... the sentences are there for context only.
I think the i+1 order refers to kanji and readings.
Edited: 2013-09-04, 9:36 am
Reply
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions! - Sign up here
JapanesePod101
killua Wrote:
Quote:The first card was "internet," but the example sentence for it also contained several other words; none of which are used in any other sentence out of the initial 12 or so. Is the default order incorrect? Should I be sorting them by some field?
The first card is "one (things)" here.
I enforced the "due" order with Browser -> Core06k -> Select All -> Reposition. I'm not sure why, but every time I import a new deck I have to do it.

Note that this is a vocabulary deck, so you should focus on the word... the sentences are there for context only.
I think the i+1 order refers to kanji and readings.
I'll try a reposition, then.

Honestly, it doesn't matter that much to me what the i+1 index is, as long as it's something. There was no i+1 in kanji, readings, meanings, grammar, heisig keyword... basically anything that I could see. If the issue is that the default index upon initial import is incorrect, then pmnox, you might wanna reflect that in the instructions on the ankiweb page.

I'm glad this is a vocabulary deck, as that's what I'm trying to focus on. At least 10 words a day (in addition to 25 kanji per day in my RtK deck) to start, but once I try the properly-sorted version, I might ramp that up.
Reply
I think you guys should consider compressing those images and audio files. Half a gig is just ridiculous for a single deck...
You've got to be able to use a lower bit rate / resolution for those audio files / images. If this deck gets popular and you've got hundreds of people syncing up gigabytes at a time, our nice free syncing service with anki won't be free much longer.
Reply
Haych Wrote:I think you guys should consider compressing those images and audio files. Half a gig is just ridiculous for a single deck...
You've got to be able to use a lower bit rate / resolution for those audio files / images. If this deck gets popular and you've got hundreds of people syncing up gigabytes at a time, our nice free syncing service with anki won't be free much longer.
I imagine that in the worst case, we'll have to use Dropbox for media or something.

We only really need to sync once per install.
Reply
vgambit Wrote:
killua Wrote:
Quote:The first card was "internet," but the example sentence for it also contained several other words; none of which are used in any other sentence out of the initial 12 or so. Is the default order incorrect? Should I be sorting them by some field?
The first card is "one (things)" here.
I enforced the "due" order with Browser -> Core06k -> Select All -> Reposition. I'm not sure why, but every time I import a new deck I have to do it.

Note that this is a vocabulary deck, so you should focus on the word... the sentences are there for context only.
I think the i+1 order refers to kanji and readings.
I'll try a reposition, then.

Honestly, it doesn't matter that much to me what the i+1 index is, as long as it's something. There was no i+1 in kanji, readings, meanings, grammar, heisig keyword... basically anything that I could see. If the issue is that the default index upon initial import is incorrect, then pmnox, you might wanna reflect that in the instructions on the ankiweb page.

I'm glad this is a vocabulary deck, as that's what I'm trying to focus on. At least 10 words a day (in addition to 25 kanji per day in my RtK deck) to start, but once I try the properly-sorted version, I might ramp that up.
After downloading the database was the ordering done by the default index? If not then I'll have to add some short guide explaining how to use "Browser -> Core06k -> Select All -> Reposition."

I feel that I should add a short guide on how to use this deck. Explain common issue and the two/tree most important indexes: the default one, one original one used for sorting by vocab and the original one used for sorting by sentence.

Haych Wrote:I think you guys should consider compressing those images and audio files. Half a gig is just ridiculous for a single deck...
You've got to be able to use a lower bit rate / resolution for those audio files / images. If this deck gets popular and you've got hundreds of people syncing up gigabytes at a time, our nice free syncing service with anki won't be free much longer.
You can disable syncing media in Anki. Maybe I should add that information in the guide as well. Audio files take most of the space 80% of the whole space. I'll have to find some sort of software to lower the bit rate. The current bit rate is at 160kbps, should I lower it to 96kbps?
Edited: 2013-09-04, 2:47 pm
Reply
pmnox Wrote:
vgambit Wrote:
killua Wrote:The first card is "one (things)" here.
I enforced the "due" order with Browser -> Core06k -> Select All -> Reposition. I'm not sure why, but every time I import a new deck I have to do it.

Note that this is a vocabulary deck, so you should focus on the word... the sentences are there for context only.
I think the i+1 order refers to kanji and readings.
I'll try a reposition, then.

Honestly, it doesn't matter that much to me what the i+1 index is, as long as it's something. There was no i+1 in kanji, readings, meanings, grammar, heisig keyword... basically anything that I could see. If the issue is that the default index upon initial import is incorrect, then pmnox, you might wanna reflect that in the instructions on the ankiweb page.

I'm glad this is a vocabulary deck, as that's what I'm trying to focus on. At least 10 words a day (in addition to 25 kanji per day in my RtK deck) to start, but once I try the properly-sorted version, I might ramp that up.
After downloading the database was the ordering done by the default index? If not then I'll have to add some short guide explaining how to use "Browser -> Core06k -> Select All -> Reposition."

I feel that I should add a short guide on how to use this deck. Explain common issue and the two/tree most important indexes: the default one, one original one used for sorting by vocab and the original one used for sorting by sentence.

Haych Wrote:I think you guys should consider compressing those images and audio files. Half a gig is just ridiculous for a single deck...
You've got to be able to use a lower bit rate / resolution for those audio files / images. If this deck gets popular and you've got hundreds of people syncing up gigabytes at a time, our nice free syncing service with anki won't be free much longer.
You can disable syncing media in Anki. Maybe I should add that information in the guide as well. Audio files take most of the space 80% of the whole space. I'll have to find some sort of software to lower the bit rate. The current bit rate is at 160kbps, should I lower it to 96kbps?
Yikes thats going to take a while. Thousands of audio files to compress ;/ But yes lowering the bit rate will reduce the file size, but don't lower it too much. We won't be able to comprehend the speaker ^^
Reply
Please don't recompress already compressed audio files. Sad

Moreover, 96kbps will still produce a huge deck (450-500MB I guess), so it doesn't solve the problem.
It's better to show people how to avoid syncing media, in my opinion.
Edited: 2013-09-04, 4:23 pm
Reply
It should be far less than 500mb if he compresses it that far. It's funny because when I rip songs their at 1000+kbps, pretty much 10 songs is about 500mb.
Reply
Look, 160kbps to 96kbps, the calculation is really easy. Wink
Reply
Xanpakuto Wrote:It should be far less than 500mb if he compresses it that far. It's funny because when I rip songs their at 1000+kbps, pretty much 10 songs is about 500mb.
There are about 20000 audio files there. Even if an average is at about 3 seconds that gives you 60000 seconds of audio.

10 songs that are like 5 minutes each. Would be equal to 10*5*60=3000 seconds of audio. If those audio files had the same quality as CD, you would need at least 15gb of space.

I'm not sure if it is worth reducing bit rate from 160 to 96 just to reduce the file size by 20-30%.

EDIT:
I forgot to mention that 6000*2 files are at 96 kbps, 3800*2 files are at 160 kbps.
So I guess the only gain would be if I were to compress everything to 48 kbps, but I don't want lower quality that much.
Edited: 2013-09-04, 5:57 pm
Reply
Quote:I'm not sure if it is worth reducing bit rate from 160 to 96 just to reduce the file size by 20-30%.
This.

The deck has a big size because of its big content. There isn't much we can do about it.
But we can find some ways around it. One could manually transfer the media files on the mobile device, I guess...
Reply
You're right, 160 is pretty low. You could do what nukemarine's deck does and host all the media on mediafire or dropbox. Of course, it wouldn't really solve the problem for mobile clients, but it would work for desktop. It would probably be a faster download, too. Only problem is that someone would have to maintain the link.

And I agree, you should put a note explaining how to turn off media sync in the deck info.
Reply
Quote:it wouldn't really solve the problem for mobile clients
Why not? Just copy the media folder on the device.

http://code.google.com/p/ankidroid/wiki/...AnkiDroid_?
Edited: 2013-09-04, 6:20 pm
Reply
Haych Wrote:You're right, 160 is pretty low. You could do what nukemarine's deck does and host all the media on mediafire or dropbox. Of course, it wouldn't really solve the problem for mobile clients, but it would work for desktop. It would probably be a faster download, too. Only problem is that someone would have to maintain the link.

And I agree, you should put a note explaining how to turn off media sync in the deck info.
Wouldn't you get the same issue with syncing decks if you were to use Nukimarine's deck and add media to it? He doesn't have any explanation about turning off media sync in his deck's info. You would have to sync whole deck if you didn't turn the media sync off.

I'm using Japanese version of Anki. I'll have to reinstall it just to make this guide.
Edited: 2013-09-04, 6:27 pm
Reply
pmnox Wrote:After downloading the database was the ordering done by the default index? If not then I'll have to add some short guide explaining how to use "Browser -> Core06k -> Select All -> Reposition."

I feel that I should add a short guide on how to use this deck. Explain common issue and the two/tree most important indexes: the default one, one original one used for sorting by vocab and the original one used for sorting by sentence.
I'm not sure what the default index is, but the first word in it was "internet." After I did a Reposition, the first word was "one (thing)." A short guide on indices and how to set them would be great. The same goes for the Reposition.

As for the files, I say leave them as-is. The speakers are already often difficult to hear when they pitch downward.
Reply
Hello guys. I'm not new here, although this is my first post.

Btw great job!!.


Well I would like to know something about that 3% that is wrong. I would like to know exactly where is the problem, how could I solve it? I'm asking because I didn't understand the exact problem.

For example u say that the sentence in english, and the word in english are wrong. But it means that that's the only thing that is wrong (these two fields), for example the word in english is "car" and the english sentence is " I'm so cold"; or there are more things that dont match, for example the word in japanese (kanji-kana) the japanase sentence (kanji-kana) etc etc etc.

Since I'm going to translate every word to spanish, I'm going to see them all, and depending what the problem is, I could solve it or not.

Also, if you know what numbers-words are wrong (first field), you just have to go to Iknow web, and fixed it.


I'm going to start with the deck but, I dont know if I should use the kore_2k/6k or this one. It depends on what u can tell me about that 3%.


I see two advantages in this deck, in comparison to kore_2k6k.

- The order of the words (first field of your deck), which is the new order that now use "Iknow". The kore_2k6k uses the old order.
- And second, the images from 2000 to 6000. I would change some of them, but u have a lot there that I could already use.

-Actually there is a third one, better or more translations of some words, not for me, but well. (MAybe they changed too some sentences (better translation I guess), I don't know that.)


Greetings!!
Edited: 2013-09-05, 7:45 am
Reply
killua Wrote:
Quote:it wouldn't really solve the problem for mobile clients
Why not? Just copy the media folder on the device.

http://code.google.com/p/ankidroid/wiki/...AnkiDroid_?
I would recommend that. I switch off media sync as it has a long history of issues on AnkiDroid. It doesn't take much for it to want to do a full sync again, which can take an age with big decks full of audio (e.g. add some Subs2SRS decks alongside this one).
Reply
pmnox Wrote:I forgot to mention that 6000*2 files are at 96 kbps, 3800*2 files are at 160 kbps.
The files with a 160 kb/s bitrate were converted to MP3 from AAC files which probably had a lower bitrate. See the japanese 先生 = core 10000? thread. I couldn't find the original AAC files anywhere, but does someone still have them?
Edited: 2013-09-05, 12:14 pm
Reply
Isinaki Wrote:Hello guys. I'm not new here, although this is my first post.

Btw great job!!.


Well I would like to know something about that 3% that is wrong. I would like to know exactly where is the problem, how could I solve it? I'm asking because I didn't understand the exact problem.
Sorry, about the confusion.

Let me explain from the beginning. There are two versions of this deck the original one from last.fm and the newer one that was ripped from iKnow (core-6000.txt). The version from core-6000.txt fixes a lot of issues like mis-translations, wrong dots, removed duplicates, etc. In general it is better than the old one. About 50 words were removed as they were duplicates. About 150 words uses a completely different sentence.

The first version of the deck was based on cards from last.fm. At that time I didn't know that the new deck core-6000.txt contains audio as well. So, I tried to match sentences from deck from last.fm to the one from iKnow. I was only able to match 5800 cards out of 6000 cards. So, 3% of cards didn't have sound.

However, I remade the whole deck from core-6000.txt, so it doesn't contain the original entries from last.fm anymore. There is nothing wrong with the current version of the deck.

Isinaki Wrote:For example u say that the sentence in english, and the word in english are wrong. But it means that that's the only thing that is wrong (these two fields), for example the word in english is "car" and the english sentence is " I'm so cold"; or there are more things that dont match, for example the word in japanese (kanji-kana) the japanase sentence (kanji-kana) etc etc etc.

Since I'm going to translate every word to spanish, I'm going to see them all, and depending what the problem is, I could solve it or not.

Also, if you know what numbers-words are wrong (first field), you just have to go to Iknow web, and fixed it.
I had no idea that there is iKnow website. I'll got and check it out, thanks.
Quote:I'm going to start with the deck but, I dont know if I should use the kore_2k/6k or this one. It depends on what u can tell me about that 3%.

I see two advantages in this deck, in comparison to kore_2k6k.

- The order of the words (first field of your deck), which is the new order that now use "Iknow". The kore_2k6k uses the old order.
One of the reason that I made this deck was to introduce a better order (second column) than the one in kore_2k/6k.
Quote:- And second, the images from 2000 to 6000. I would change some of them, but u have a lot there that I could already use.
I agree that some of them could be better. They were taken from bing image search.
Quote:-Actually there is a third one, better or more translations of some words, not for me, but well. (MAybe they changed too some sentences (better translation I guess), I don't know that.)
Greetings!!
The other advantage is the filtered list of about 3800 sentences that from Core10v4, but are not in the kore2_6k.

Btw, I wrote a short guide explaining how to use the deck. I'm going to update the deck description tomorrow.
Edited: 2013-09-05, 3:55 pm
Reply
V14... Did you update the deck or just the description?
Reply
Hi pmnox. Thanks for answering.

pmnox Wrote:Let me explain from the beginning. There are two versions of this deck the original one from last.fm and the newer one that was ripped from iKnow (core-6000.txt). The version from core-6000.txt fixes a lot of issues like mis-translations, wrong dots, removed duplicates, etc. In general it is better than the old one. About 50 words were removed as they were duplicates. About 150 words uses a completely different sentence.

The first version of the deck was based on cards from last.fm. At that time I didn't know that the new deck core-6000.txt contains audio as well. So, I tried to match sentences from deck from last.fm to the one from iKnow. I was only able to match 5800 cards out of 6000 cards. So, 3% of cards didn't have sound.

However, I remade the whole deck from core-6000.txt, so it doesn't contain the original entries from last.fm anymore. There is nothing wrong with the current version of the deck.
Well that's great. I'll just have to translate the words to spanish then.



pmnox Wrote:I had no idea that there is iKnow website. I'll got and check it out, thanks.
And you can see all the words and take audio and images for free, since there isn´'t anything wrong with the deck now, there is no need, but well.



pmnox Wrote:One of the reason that I made this deck was to introduce a better order (second column) than the one in kore_2k/6k.
I saw it but I didn't take a look. I used the order of the first column-field, that is the new order of Iknow. It seems pretty good to me, although I only saw a few words of the beginning. The kore_2k6k uses the old order of Iknow or well, smartfm.
I'll have to check yours, that second column, thanks for the info!.

pmnox Wrote:The other advantage is the filtered list of about 3800 sentences that from Core10v4, but are not in the kore2_6k.
Good to know!


Btw what do u think about making two separate decks? (maybe it's not possible because of your order of words,second column-field, I don't know that)


I'll start right now to download that v14. If I'm not mistaken the only thing left now it's the pitch accent thing, so this will be my version Big Grin.


greetings and GJ!!
Edited: 2013-09-06, 5:24 am
Reply
killua Wrote:V14... Did you update the deck or just the description?
I changed just the description.
Reply
In the next version I'll include additional field that contains kanjified version of the word. I'm going to continue updating it as I progress through the deck.

some examples:
おもちゃ ー 玩具
ふすま ー 襖
ねずみ ー 鼠
もちろん ー 勿論
とんでもない ー 飛んでもない
おおげさ ー 大袈裟
つみり ー 積もり
etc.

Those are mostly older forms of words, but I have seen them being used from time to time. For me it's easier to remember the word when I see kanji that it uses.

I'll add one more field that gives hint useful for production cards:
インターネット ー KA
車 ー 1k
自動車 ー 3k
綺麗 ー 2k
美しい ー 1kh
etc.

I use it mostly to save time by not having to read the whole sentence to figure out which word it is.
Edited: 2013-09-06, 9:04 am
Reply