RECENT TOPICS » View all
Stian wrote:
And core sentences doesn't seem to be made in an n+1 way, which is ideal.
What's n+1 for you is not necessarily n+1 for someone else, because not everyone knows the same words going in. It's impossible to create an n+1 deck for anyone except total beginners.
But Core can easily be made into n+1. All you need to do is SUSPEND any sentence you encounter that has two or more unknown words, and DELETE any sentence you encounter that has no unknown words.
Then, from time to time, look through the suspended sentences, unsuspend the ones that became n+1.
That everyone doesn't learn in the same fashion also means that the 6000 most common words each person may encounter aren't the same as in the Asahi newspaper. If you're into video games, you may notice a lot of words that doesn't show up in newspapers. Core is also J-E from beginning to end.
I'd rather learn with anki than from it. I get most of my sentences from native material - ranging from youtube comments to literature. Anki is only a tiny fraction of my language study, and I want it as related to it as possible.
This way, I only add sentences that are truly n+1. And the only thing limiting me from adding more is simply that I don't want my reviews to escalate; 100 reviews (=18-20 min) is ideal imo.
Last edited by Stian (2013 January 28, 5:17 am)
Stian wrote:
That everyone doesn't learn in the same fashion also means that the 6000 most common words each person may encounter aren't the same as in the Asahi newspaper. If you're into video games, you may notice a lot of words that doesn't show up in newspapers. Core is also J-E from beginning to end.
I'd rather learn with anki than from it. I get most of my sentences from native material - ranging from youtube comments to literature. Anki is only a tiny fraction of my language study, and I want it as related to it as possible.
This way, I only add sentences that are truly n+1. And the only thing limiting me from adding more is simply that I don't want my reviews to escalate; 100 reviews (=18-20 min) is ideal imo.
OP was asking about ways to improve the Core deck. "don't use it" is not an answer to that.
Yes, you make valid points. I agree, after establishing a solid base, there's absolutely no need to continue drilling sentences like a madman, to learn the language. Reading and immersion are enough, and the best way to speed the process up is by SRS-ing only stuff you come across naturally, and only lightly, like you said. But this is something I figured out by actually using various methods, and seeing how they help me. Others can and will do the same, no one's just gonna take our word for it. The first instinct people have when it comes to learning is to be masochistic, as per the false "no pain, no gain" premise. But we all learn, eventually.
In the meantime, let's help the OP make his Core deck as efficient as possible. That's what he's asking for.
P.S. But, since I got into it already, I'm gonna self-indulge and go off topic to explain what I mean by a solid base:
1. Rtk
2. Tae Kim Grammar examples - preferably the closed delete version
3. A thousand or so pre-made sentences , but with everything that's not n+1 taken out, especially the things that are n+0. Core2k is the best because of the audio (and even the image - which helps draw your attention away from the English sentence, making the deck de-facto Japanese to audio and picture rather than Japanese to English). While I'm not gonna bother (because I'm halfway done already, and I'm only planning on doing 1000 pre-made sentences, 1200 max), I also see the value of filtering out the stuff on laurie_ranta's list. Make the pain of the 1000 sentences really count, by learning the most frequent vocab you possibly can.
Having gone through most of this already, I do think this amount of pain is in fact necessary. But no more.
egoplant wrote:
I'm wondering if anyone has something like the core deck...I want the manga version
I too was dissatisfied with the Core deck and started refering to it as Japanese for Boring People; I wanted to learn the word for "axe" before figuring out how to file taxes. Thankfully there's a handful of solutions for this.
If you're unaware of it, I couldn't recommend cb4960's subs2srs plugin more highly. You can literally make a deck full of thousands of sentences from anime (Nukemarine has some useful video guides on this). If you combine it with my MorphMan plugin you can achieve your goal in at least two different ways:
1) Use MorphMan to get all the morphemes used in the anime you enjoy. You can even add words from manga if you can dump the scans to a text file (eg. OCR). Alternatively you could just turn lauri_ranta's list in a morph man database. You can then use this database of "words you want to learn" to filter the Core deck by having it tag any Core sentence that contains one of those "good" words.
2) Alternatively, drop Core altogether and have MorphMan sort your subs2srs sentences by difficulty and only study i+1 sentences. You can even emulate a vocab deck by having the unknown word of your i+1 sentence filled into a separate field and then displayed in your layout.
The second option isn't 100% perfect because it doesn't take grammar and idioms into account when determiing i+1'ness (still does a better job than Core based on my experience), but it has the benefit of being calculated dynamically based on your knowledge. Thus, if you want to skip a sentence for whatever reason, you're free to just suspend it and not worry as dependencies of prior knowledge are always recalcualted unlike static decks like Core.
Stansfield123 wrote:
What's n+1 for you is not necessarily n+1 for someone else, because not everyone knows the same words going in. It's impossible to create an n+1 deck for anyone except total beginners.
This is exactly why static decks should be enhanced with personalized filtering/sorting (as option #1 above) or replaced with dynamic decks (as option #2 above).
Last edited by overture2112 (2013 January 28, 9:53 am)
@Stansfield123
I was just commenting on the "it's premade, therefore it is the ultimate choice". And I agree that the first sentences must come from a "boring" source. Personally, I ripped about 1000 from the Genki books.
Your plan is more or less identical to what mine was. I have ~2700 sentences in my deck, and about 1000 of them have only Japanese definitons.
You should consider making a transition into complete J-J, but going cold turkey into it might be demotivating.
Nukemarine wrote:
If it helps, I've hacked together a list based on CB's 5100 Japanese Novel Word Frequency. First, I took the all the Kana words from the top 10,000 and put those words into their own list. Though the first 100 are bound to be grammar words, these will be important to study.
I then took the first 10,000 Kanji only words and bunched them into groups 500, 1500, 2000, 2000, 2000, 2000. Each were sorted by the KO2k1 order. Finally, I organized that so that the first 1000 words came from KO2k1 555, then the next 1000 for KO2k1 1110, the next 2000 words based on the entire 2k1 and after that there was no filtering.
What that gives you is a go to list of "should learn these" to help you read novels easier. I don't think it's that difficult to use the equivalent Core 2k/6k/10k cards where there's a match, and fill in the missing info from the kenkyusha dictionary info which I think was the go to dictionary when they made the sample sentences for Core 2k/6k/10k.
Anyway, I'll put the list on Google Drive and if it helps you then all is good. If you have a better frequency list such as one based off thousands of Manga or Anime scripts, then it's not much to sort that via the method I think works (group frequency in bunches, sort those bunches via KO2k1 order, spread kana words throughout).
This sounds great. I was going to take a stab at the rest of the words in the Core 10000 list, outside of Core 6000, when you posted this. Have you calculated how many words in your new list are outside Core 6000/10000? I'm more interested in novels than newspapers so maybe your word list is a better match for me. Though I'm a bit concerned how much work it would take to add definitions and example sentences to it since I'm not good at all with scripts. The good thing abt Core 6000/10000 is that the the words are already set up with definitions/examples/audio., etc..
(except for using subs2srs like overture was saying) all the no-audio solutions in this thread sound like great ways to learn how to read, and yet still not be able to hear or speak/pronounce anything.
PotbellyPig wrote:
Nukemarine wrote:
Anyway, I'll put the list on Google Drive and if it helps you then all is good. If you have a better frequency list such as one based off thousands of Manga or Anime scripts, then it's not much to sort that via the method I think works (group frequency in bunches, sort those bunches via KO2k1 order, spread kana words throughout).
This sounds great. I was going to take a stab at the rest of the words in the Core 10000 list, outside of Core 6000, when you posted this. Have you calculated how many words in your new list are outside Core 6000/10000? I'm more interested in novels than newspapers so maybe your word list is a better match for me. Though I'm a bit concerned how much work it would take to add definitions and example sentences to it since I'm not good at all with scripts. The good thing abt Core 6000/10000 is that the the words are already set up with definitions/examples/audio., etc..
Here's the current Optimized Frequency Sort spreadsheet.
I'm sure it's no problem matching the Core 10k list with this to add the optimized by book frequency index. Pretty sure the results would be great for most students.
Let me clarify that any of these lists are great. It's just a matter does the student want a word list based on frequency in newspapers, books, anime subtitles, drama subtitles, all subtitles, some other media, etc. The core of it lies in frequency of words, sub-grouping of those words into 500, then 1500, and finally groups of 2000 words, and the KO2k1 sorting of those subgroups with kana only words evenly spread out. After that, it's a matter of seeing if the word happens to have a Core 10k entry for a easy to make flash card.
Given Anki 2.0's easier way to resort new cards, it shouldn't be that hard to reorder your existing vocab deck if you feel a different optimized list is up your alley.
dtcamero wrote:
(except for using subs2srs like overture was saying) all the no-audio solutions in this thread sound like great ways to learn how to read, and yet still not be able to hear or speak/pronounce anything.
Listening to native media covers the hearing part. Perhaps 100% of your Japanese exposure was from Anki, but some of us listen to native material for hours every day.
@ Lauri Ranta:
願書 application form
鰻 eel
お辞儀 bow
細長い long and thin, long and narrow
時給 hourly wage
At least these words are fairly common in my experience (N2, studying for N1) and I've heard them more than a few times in anime (2-4) or in real life/web (1 and 5).
If you don't consider 1 and 5, and ignore unagi because it's more common in kana, at least ojigi and hosonagai should be added imho. As you were saying maybe the small sample has take his toll on the accuracy of the list...
On another note, I've seen your website and I was quite impressed by all that useful material!
Do you think you can add something for onomatopoeia ordered by frequency in the same way you just did? *_*
Stian wrote:
Listening to native media covers the hearing part. Perhaps 100% of your Japanese exposure was from Anki, but some of us listen to native material for hours every day.
How do you know what covers what? Are you fluent yet? Has your Japanese project ended already with a successful result? Have you peeked inside your brain recently to see that your nourishment of one thing but not another was in fact ideal?
Ya sure you can do that. You can raise a child in an orphanage but with two parents you get a better outcome. I'm not saying you personally have to, just that it might be a good idea to consider.
And come on of course i listen to hours of media too...the only person who would be doing all of his/her listening in Anki is some kind of binary all-or-nothing study-fascist...like the guy who would do all his listening outside of Anki... or... erm...
Well:
http://japaneselevelup.com/
This fellow used the same method, and he seems to be doing fine. Also, Khatz didn't use a premade anki deck either.
I'm not fluent yet, but I have learnt English without using a premade anki deck, and I turned out just fine. ![]()
You can also move away from your parents when you turn 18, or you can when you turn 40.
And come on of course i listen to hours of media too...the only person who would be doing all of his/her listening in Anki is some kind of binary all-or-nothing study-fascist...like the guy who would do all his listening outside of Anki... or... erm...
Is that just some lame joke...?
I feel like I'm talking to my grandpa who isn't tracking the conversation... I'm talking about putting audio in flashcards, not pre-made decks. ok we up to speed?
FYI your lord and master Khatz (so probably also his henchman at jlevelup) is one of the biggest propenents of audio in flashcards.
http://www.alljapaneseallthetime.com/bl … -ramblings
http://www.alljapaneseallthetime.com/bl … 1254937562
http://www.alljapaneseallthetime.com/bl … 1071887830
http://www.alljapaneseallthetime.com/bl … 7350014159
http://www.alljapaneseallthetime.com/bl … 7303766835
Khatz wrote:
The new holy grail!
1. Rip movie to audio
2. Split into 2~5-second clips
3. Use clips to make audio flashcards
Manual splitting gruntwork=0!
if I can be permitted to guess why he didn't do it more, its probably because he was doing the splitting himself (which ironically still is way more gruntwork than req'd today) and because he was hyping his own SRS program for personal/commercial reasons.
moreover your displayed english is full of caustic overreactions and misunderstandings-leading-to-attacks on my previous posts, so I'm not really so convinced that your study methods are optimized sir.
Oh... I thought you were backing up the part of your point which I argued against in the first place:
"not to mention the fact that it's pre-made, make it heads and tails a winner?"
My fault...
And are TTSs reliable? Seeing that most of my sentences are from written sources....
Last edited by Stian (2013 January 29, 11:54 am)
ya I just read about some guys taking snippets out of context from larger texts and uses them to justify their arguments...I think they were blowing up health clinics or something.
I wouldn't use TTS. I appreciate the desire to use texts, but I think the value of connecting what you're learning to the media you're passively listening to is of such significance that using a slightly less-optimal source (like drama w/ subs2srs for example) is worth it.
Last edited by dtcamero (2013 January 29, 12:03 pm)
ya I just read about some guys taking snippets out of context from larger texts and uses them to justify their arguments...I think they were blowing up health clinics or something.
This example would only be valid if I downloaded core and deleted my anki deck. You're strawman skills aren't really optimised.
But thanks for the heads up... my network speed is suboptimal, but I should try to do that with the drama I've been downloading the last four days. :p
However, I find that I am able to pick up the words I have in my deck even if I don't have audio samples.
Last edited by Stian (2013 January 29, 12:22 pm)
Stian wrote:
But thanks for the heads up... my network speed is suboptimal, but I should try to do that with the drama I've been downloading the last four days.
Quick solution: Never underestimate the bandwidth of fedex carrying dvds from yesasia ![]()
Stian wrote:
However, I find that I am able to pick up the words I have in my deck even if I don't have audio samples.
I don't think dtcamero is implying it's impossible, but rather you can make it easier on yourself (and thus be more efficient, ie learn more in the same time/effort spent studying) if your deck helped you immediately and directly associate the sounds with a native's rendering.
Nukemarine wrote:
Here's the current Optimized Frequency Sort spreadsheet.
I'm sure it's no problem matching the Core 10k list with this to add the optimized by book frequency index. Pretty sure the results would be great for most students.
Let me clarify that any of these lists are great. It's just a matter does the student want a word list based on frequency in newspapers, books, anime subtitles, drama subtitles, all subtitles, some other media, etc. The core of it lies in frequency of words, sub-grouping of those words into 500, then 1500, and finally groups of 2000 words, and the KO2k1 sorting of those subgroups with kana only words evenly spread out. After that, it's a matter of seeing if the word happens to have a Core 10k entry for a easy to make flash card.
Given Anki 2.0's easier way to resort new cards, it shouldn't be that hard to reorder your existing vocab deck if you feel a different optimized list is up your alley.
Thanks for the list. I went into excel and did a quick check and there seems to be 14,000 words total that are unqiue(combinng Core 10,000 to your 10,000 word list and removing duplicates). That's a lot. Do you know any way of fairly easily adding the definition, kana reading, parts of speech and maybe an example sentence for each word? The one way I can think of is to export the spreadsheet to a text file, load it into rikaisama and click on each word to create a anki card.
PotbellyPig wrote:
Do you know any way of fairly easily adding the definition, kana reading, parts of speech and maybe an example sentence for each word?
I would import them straight from the spreadsheet into an Anki deck then generate readings via japanese plugin, example sentences via one of the example sentence plugins, definitions from one of the gloss plugins (I haven't updated mine from Anki 1 yet but I assume someone else has made one by now?), etc. Alternatively you could do it mostly out of Anki via some of cangy's scripts if you aren't afraid of CLIs.
Not sure how to "easily" get the POS though. You can probably get it along with the definition from a glosser, but not it's own field. I don't think I ever exposed a way to easily store the POS or sub-POS data from MorphMan into a field either, despite storing that data.
Just be careful with the Japanese support plugin; it might not always generate correct furigana.
PotbellyPig wrote:
Do you know any way of fairly easily adding the definition, kana reading, parts of speech and maybe an example sentence for each word? The one way I can think of is to export the spreadsheet to a text file, load it into rikaisama and click on each word to create a anki card.
Epwing2Anki was created for this kind of task.
cb4960 wrote:
Epwing2Anki was created for this kind of task.
I can recommend this, I use it all the time. I just add a bunch of words to a text file when I'm reading, and then use epwing2anki and import them.
egoplant wrote:
cb4960 wrote:
Epwing2Anki was created for this kind of task.
I can recommend this, I use it all the time. I just add a bunch of words to a text file when I'm reading, and then use epwing2anki and import them.
Great! I'll try my hand at it. I figure I'm willing to add maybe 3000-4000 cards from one of these lists. I've already gone through Core 6000. I'm into reading light novels so I figure that the list from regular novels would be more appropriate than a list generated from newspapers. But I'm undecided. The rest of Core 10000 has audio and I also feel there is a sense of accomplishment from completing Core 10000 in its entirety. I'll use this tool to generate the meanings for words outside of Core 6000 and try to decide.
PotbellyPig wrote:
Do you know any way of fairly easily adding the definition, kana reading, parts of speech and maybe an example sentence for each word?
I currently get kana versions and translations from edict:
edict = IO.read("edict_sub.txt")
(IO.read("input.txt").split("\n") - IO.read("learned.txt").split("\n")).uniq.shuffle.each { |line|
scan = edict.scan(/^#{line} \[(.*?)\] \/(?:\(.*?\) )*(.*?)(?: \(.*?\))?\//)[0]
next unless scan
puts line + "\t" + scan[0] + "\t" + scan[1]
}
If a word is not in edict_sub (a subset of EDICT for about 20,000 priority entries), it's likely that I shouldn't learn it yet either.
kazeatari wrote:
At least these words are fairly common in my experience (N2, studying for N1) and I've heard them more than a few times in anime (2-4) or in real life/web (1 and 5).
If you don't consider 1 and 5, and ignore unagi because it's more common in kana, at least ojigi and hosonagai should be added imho. As you were saying maybe the small sample has take his toll on the accuracy of the list...
On another note, I've seen your website and I was quite impressed by all that useful material!
Do you think you can add something for onomatopoeia ordered by frequency in the same way you just did? *_*
I don't want to remove or add words manually, but I'll add (hopefully more accurate) word frequencies to my Core 6000 text file at some point.
I recently made a list of hiragana-only words to review. (It's in rōmaji because I used it as input in a type training application.) There is also a version of EDICT with word frequencies indicated by Yahoo search hits in blog.goo.ne.jp. Here are all words tagged with on-mim (onomatopoeia / mimetic) sorted by frequency: http://lri.me/upload/edict-freq-on-mim.txt.
Last edited by lauri_ranta (2013 February 08, 3:55 pm)
I run into plenty of words in my daily life that aren't in the 20k most common. If the word shows up occasionally, where ever you are seeing it, you shouldn't hesitate to add it just because its not in the 20k.

