Textbook mining or core6k?

Index » The Japanese language

  • 1
 
Reply #1 - 2010 June 23, 3:40 pm
Xero Xenith New member
Registered: 2010-06-05 Posts: 9

(Long-time beginner here, starting to take Japanese seriously! smile Apologies if this is in the wrong forum.)

I have a simple question: textbook mining with Anki, or core6k with Anki?

My eventual goal (obviously) is proficiency in Japanese, with my milestones along the way being the JLPT series - I'm at around 3-kyuu now - and the British Edexcel Japanese exams. I have the textbooks for these, and they're full of example sentences and other wonders easily exploitable by Anki. core6k is also full of example sentences, and is even easier to exploit by Anki!

However, I've noticed that the core6k only has 555 words tagged "JLPT4", while the official vocab list dictates 727, and 510 tagged "JLPT3", with the official figure at 681. That's 343 words not accounted for - 25%! Is it wise to trust core6k when a quarter of the words I'll likely need are missing?

From what I've heard, core6k is an excellent resource - but I'm very confused as to how relevant it would be to the milestones I've chosen.

Any ideas would be much appreciated! smile

Reply #2 - 2010 June 23, 4:06 pm
thurd Member
From: Poland Registered: 2009-04-07 Posts: 756

Just mash those two lists together and you will have them all.

I can't seem to find my post where I put the numbers in but JLPT3&4+Core2000 is something over 2000 words add Core6000 to that (one from smart.fm) and it will shoot into 5400 level. Add another milestone JLPT2 and its 1200 more for a total of about 6600.

Just go from the beginning, JLPT4 then 3 its not much but at that point any JLPT3 workbook is accessible to you. They are graded quite nicely, words overlap a lot so each time you add another step you already have the knowledge for a bigger part of it. Just have to take the first step.

Reply #3 - 2010 June 23, 4:52 pm
Xero Xenith New member
Registered: 2010-06-05 Posts: 9

Hey, thanks a lot for the reply! Would this be the post you mean?

That sounds like a nice simple way of doing things, but does it mean that core6k is missing a lot of common words designated for absolute beginners? Or does 4-kyuu include a lot of words that aren't common at all? That's the main worry holding me back from just going all-out.

Also, I'm a bit confused by your numbers - core6k means 6000 words (right?), so how does JLPT3&4, Core2k and Core6k all combined make 5500?

Thanks very much smile

Advertising (register and sign in to hide this)
JapanesePod101 Sponsor
 
Reply #4 - 2010 June 23, 8:23 pm
Nukemarine Member
From: 神奈川 Registered: 2007-07-15 Posts: 2347

If you're worried about passing a JLPT level then just use words for the JLPT. The Core list are derived from frequency list of Japanese newspapers, which despite inherent problems with that still make for a good list of words to learn for literacy.

But the JLPT and Core are not the only word lists. There's the MEXT Vocabulary where if you just look at "JLPT 2" Kanji (about number 1110) you'll see over 6600 words are covered of which a few are dupes. Hell, some people don't even use word lists at all. They just sentence mine and learn words as they come up. In such a method common words will float to the top just by virtue of being in every area of expertise and subject.

Don't worry about an all inclusive list. They don't exist. Look for something that aids you learning new words. For many, that's the Core list as it's got sample sentences, sample dialogue, can be sorted via Cangy's program, etc. If you find the JLPT list more useful then use that. Just know that after 2000 or 4000 or even 6000 words you're better off than the previous step no matter your word list choice.

Reply #5 - 2010 June 24, 8:25 am
Xero Xenith New member
Registered: 2010-06-05 Posts: 9

Thanks Nukemarine, that's some good advice! I've had a look over the core6k list, checked over a few of the non-JLPT books I have, and by and large, all the words in all of them are in the core6k list. This is great for my general learning, but it still doesn't solve the problem of the JLPT.

Would this be a good general course of action -

1. Merge in all the remaining JLPT4/3 vocab to my core6k deck and get going, but ignoring the core6k index order and just focusing on the JLPT4/3 words until I'm confident enough to take the test;
2. At the same time, work on mining good grammar example sentences from my collection of textbooks until I have enough understanding of those as well;
3. Once both of those are done, just plough though my new mixed vocab/grammar deck (including the rest of core6k, now in index order) until road's end?

That seems appropriate to me, just looking for some feedback before I devote a large portion of my life to it tongue

I would buy a mining hat and jump straight into Japanese literature, but time isn't on my side at the moment, nor will it be in the foreseeable future, unfortunately. Hence the notion of core6k has seemed like something of a godsend wink

In any case, thank you so much for your help, both of you! smile Needless to say, this thread has been more helpful to my learning already than four years of grade-school Japanese ever could...

Last edited by Xero Xenith (2010 June 24, 8:28 am)

Reply #6 - 2010 June 24, 9:21 am
Nukemarine Member
From: 神奈川 Registered: 2007-07-15 Posts: 2347

Now, there is an option if you want to import a lot of words but not the duplicates from another list. However, you have to be a little savvy with using spreadsheets and formulas.

Use the spreadsheet with Core 2k and 6k, include markers in columns so you know the position of the words: Core 2k; Step 10; Word 025 or Core 6k, Step 12; Word 368 for example. This'll help with sorting.

Next, get a spreadsheet of your JLPT words, hopefully all of the from 4 to 1. These should have markers such as JLPT 4 again for ease of sorting.

Both sheets should have Kanji word and it's Kana and maybe even an English translation. Now, sort by Kana then by the markers so that Core is first. After that, it's creating a formula that compares the Kanji words of words with the same kana that leaves a marker like "duplicate". From there, it's a matter of removing the dupes leaving you with pretty much only unique words. You should have a JLPT list that you can sort and import into your Anki vocabulary deck.

Note, this seems complicated but not too much. Also, you won't weed out every dupe because of variants when it comes to Kanji. In addition, there's the rare case where you have same words but different meanings. I did the above using the Tanuki word list (7000+ entries) against Core 2k/6k. Almost 3500 words were weeded out.

Reply #7 - 2010 June 24, 11:20 am
kame3 Member
From: Netherlands Registered: 2009-09-01 Posts: 133

You could also download the huge CorePLUS deck on Anki. It got tags for Core 6000 and JLPT. So you could just delete the Core6000 in that deck and import your deck into it.

For duplicate checking you could also use the Jxplugin.

Last edited by kame3 (2010 June 24, 11:22 am)

Reply #8 - 2010 June 24, 11:50 am
Xero Xenith New member
Registered: 2010-06-05 Posts: 9

Thanks for your suggestions guys!

kame3, I've just given that deck a look, but the tags for the JLPT series aren't complete - they're not really usable. Thanks for the idea though, and I might come back to that deck later on.

I've done some playing with Nukemarine's idea, and came up with a spreadsheet that does it all automatically. It seems to be the ultimate resolution to my problem - thank you! smile

Before I get started, what do people think of my course of action (outlined two posts up)? I'd like some feedback on that, if possible smile

Last edited by Xero Xenith (2010 June 24, 12:00 pm)

Reply #9 - 2010 June 24, 1:14 pm
Thora Member
From: Canada Registered: 2007-02-23 Posts: 1691

Thanks for this, XX!  I spent frustrating ages trying to figure out how remove duplicates with Excel, but it was beyond my skill level.  And today, it magically appears.  smile

Reply #10 - 2010 June 24, 10:36 pm
Asriel Member
From: 東京 Registered: 2008-02-26 Posts: 1343

First off, excellent job with that excel file. I'm pretty sure I'll be using it sometime in the future, unless someone makes everything I want before I get there.

What I would like is a deck of "JLPT Vocab 1-4(5) that is not covered in Core6000, tagged by JLPT Level"

And then i would personalize it more by adding:
"---with definitions/examples from 研究社 instead of EDICT"


another question:
could I use your excel thing to:
1. export my deck of random vocab words found in the wild
2. Put them in instead of Core6000
3. Find words on JLPT list that isn't in my deck already?

Reply #11 - 2010 June 25, 5:33 am
cangy Member
From: 平安京 Registered: 2006-12-13 Posts: 372 Website

janettek wrote:

Would it be helpful or not, if I re-posted the CorePLUS deck with tags for all the JLPT words?

kore has jlpt level if that helps (555/683 JLPT4, 510/624 JLPT3, 2665/3899 JLPT2, 895/2955 JLPT1, 1375 other)

janettek wrote:

I was initially too lazy to run overwritefields again.

there's a less-tedious overwrite fields replacement available now

Reply #12 - 2010 June 25, 12:56 pm
Xero Xenith New member
Registered: 2010-06-05 Posts: 9

I really wasn't expecting people to leap at the Excel file so soon! Feel free to do whatever you like with it, it's obviously not perfect. If you'd like to clean it up and share, please do so.

janettek wrote:

Would it be helpful or not, if I re-posted the CorePLUS deck with tags for all the JLPT words?  I was initially too lazy to run overwritefields again.  Currently, only the JLPT words 1-4 that aren't already in core6000 are tagged with JLPT levels.  Also, I was looking at the word lists at http://www.tanos.co.uk and it seemed that the number of words they quoted for JLPT levels 1 through to 5,  did not quite match what was in their tables or .doc files.  Do you have any other suggestions for word lists?

My personal favourite is this website. He recently updated it for the new series of JLPT exams (without N3, of course). In fact, the new JLPTs (as of next month) do not have official wordlists at all, but they're meant to be the same difficulty as the JLPTs of old, so I'm sure the old lists are a more than reasonable resource.

I'd be extremely grateful if you could update it with all of this! It seems like a very, very impressive deck, and I'm completely with you on your reasoning behind it - it's easier to unsuspend than to add! tongue

One more idea to improve the deck - for the sake of conformity, you could also clean up the "definition" section for the JLPT words so it includes only the straight definition, and not any other information (that has the tendency to seem clutter-ey, if you get my drift). Of course, it's your deck and not mine, and I'd be happy to export and do this myself in Excel, but I thought I'd mention it. smile

Asriel wrote:

What I would like is a deck of "JLPT Vocab 1-4(5) that is not covered in Core6000, tagged by JLPT Level"

Yes, you could do that with my spreadsheet. You could add the words for one level, use my sheet (hehe, sounds rude... sorry), remove/check, then repeat. Or, you could add them all in one go with the tags next to them. It's up to you smile

Asriel wrote:

another question:
could I use your excel thing to:
1. export my deck of random vocab words found in the wild
2. Put them in instead of Core6000
3. Find words on JLPT list that isn't in my deck already?

Yes, of course! Simply remove the contents of the "A" and "B" columns in the "core6k" sheet, and fill in your own. You would need to edit some of the formulas if your "random vocab" is more than 6000 words; the sheet specifies some ranges in the core6k sheet that only go up to 6000. These cells are F2, G2 and I2 in the "duper" sheet. (Though, looking back, I didn't actually specify 6000; two say "6728" and one says "7358". That should be cleaned up.)

Reply #13 - 2010 June 26, 9:54 am
cangy Member
From: 平安京 Registered: 2006-12-13 Posts: 372 Website

janettek wrote:

It's a shame, since I intentionally left all the fields from kore.text int the corePlus deck, the JLPT levels should be there.  Perhaps I missed the field, or perhaps it was added later.

it was added later

janettek wrote:

There will still be a bit of rubbish as the only way to get a really clean deck would be to start again, and I don't want to take the time or lose my scheduling/progress.  I believe there is no way to export due dates etc for cards.  Is this correct?

for the kore facts you can overwrite them all in one go with franki by keying off the index field, but for other stuff you've added you'd need a unique key...

  • 1