![]() |
|
Printable core 2000 vocabulary list - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: Learning resources (http://forum.koohii.com/forum-9.html) +--- Thread: Printable core 2000 vocabulary list (/thread-9762.html) Pages:
1
2
|
Printable core 2000 vocabulary list - Savii - 2012-08-13 Currently I'm studying the original Core2k using Anki. I wanted a copy on paper to be able to study the words without a digital screen as a complement to SRS, so I did some coding and produced a (hopefully) good looking PDF. Perhaps other people will also find this useful so I thought I'd post it here. Update: there are now several PDF versions available as well as a spreadsheet that can be used as source data for an SRS deck. Update 2: added new versions of the iKnow PDFs and TSV (with furigana and several other enhancements), and links to derived works. Update 3 (a few years later): started producing updated files in light of the recent developments regarding iKnow and Anki. I don't need Core for myself anymore but it would be nice to have some up-to-date archived source material for people to work with in case the API disappears behind a paywall manned by evil lawyers or something. Please let me know if I screw something up, didn't have much time to double check the results. March 2015 iKnow! list Same as below, based on a recent copy of the site's data. Also added some fields (romaji transcriptions and sentence difficulty rating). For now only the raw data, spreadsheet and media files are provided, so you can make your own deck and possibly share it with others (I'll gladly include quality decks in this post). I'll try to add furigana and new PDFs (like the section below has) later when I have time to sort that out. TSV (spreadsheet with Anki compatible media references): Mediafire, Mega Media pack (sentence/word audio and pictures): Mega [321 MiB] JSON (raw data for programmers): Mediafire, Mega November 2012 iKnow! list Ordering and translations are much more up to date than Core PLUS below. Furigana version is now available (though there may be markup glitches in some places). Goes up to 6000. Not in sync with most shared Anki core decks, but the TSV file can be used to create one. PDF variations (divided in 6 documents with 1k each): - Furigana & translations: preview (first 1k), Mediafire download - Furigana but no translations: Mediafire download - Kana transcriptions & translations: preview (first 1k), Mediafire download TSV (spreadsheet with all 6k): Mediafire JSON (raw data for programmers): Mediafire Other works based on this data: - Created by buonaparte: a set of Office documents including media - Created by frony0: a ready-to-use Anki deck with media packages Core PLUS list Based on an older Anki deck from the smart.fm days; has some outdated and incorrect entries. It has full furigana for the example sentences. Contains the original core 2000. Only advantage to newer lists above: the order is synchronized with Core PLUS and many other popular pre-made Anki decks as well. PDF (single document): preview (complete 2k), Mediafire download Printable core 2000 vocabulary list - frony0 - 2012-08-13 Very nice! I hope this doesn't get lost in the forum archives. Very useful. Printable core 2000 vocabulary list - turvy - 2012-08-13 At the bottom: 今日は一人で映画を見ます I saw a movie by myself today. That can't be right. Printable core 2000 vocabulary list - Savii - 2012-08-13 turvy Wrote:At the bottom:Well the list is extracted from a public Anki deck, which in turn is based on an official smart.fm deck when that website still existed. Since there are so many items it's natural that there will be some errors. I've already fixed a few entries manually and I don't mind a few more, however I personally don't think it's worth it to extensively quality check the whole list. I'm considering re-doing this with a different source (current iKnow! website) though. It'll be a lot more up-to-date and I could easily go up to 6k as well. Downside would be the misalignment with most pre-made Anki decks, since Cerego has shuffled the order quite a bit. Printable core 2000 vocabulary list - PotbellyPig - 2012-08-13 It would interest me if you can do this with iKnow! as the source. iKnow! updated their definitions since the Anki deck was created. Is there any way to take the html source from the page where you can see all the words in a lesson and add them to a spreadsheet? Ths would be a great time saver for me. Printable core 2000 vocabulary list - s0apgun - 2012-08-13 http://pastebin.com/s8HEf27Z Pastebin link for CORE2k vocabulary with single character lines removed. Thanks to Nestor / http://darkjapanese.wordpress.com/ Printable core 2000 vocabulary list - Savii - 2012-08-14 PotbellyPig Wrote:It would interest me if you can do this with iKnow! as the source. iKnow! updated their definitions since the Anki deck was created. Is there any way to take the html source from the page where you can see all the words in a lesson and add them to a spreadsheet? Ths would be a great time saver for me.I did some research just now and it's actually easier than I'd thought; no need to parse any HTML pages, the entire 6k is served in programmer friendly JSON. In fact I've already managed to update the list with this as the new source. The only thing I'm still struggling with is furigana; iKnow has sentences in both kanji and kana, and I'm trying to create a procedure for reliably merging that into a single kanji sentence with furigana. Printable core 2000 vocabulary list - frony0 - 2012-08-14 Perhaps it's possible to update the decks on Anki to use the new data too. There's a Core1k deck, but no more of the revised series.. Can you describe the api? Printable core 2000 vocabulary list - Savii - 2012-08-14 Update: furigana is going pretty well, except I'm not sure how reliable it is. I'll try to check the result against possible kanji readings to verify correctness. frony0 Wrote:Perhaps it's possible to update the decks on Anki to use the new data too. There's a Core1k deck, but no more of the revised series.. Can you describe the api?The website uses asynchronous requests (AJAX) to fetch JSON packages containing 100 vocab items each (i.e. core 3000 step 4), which are then formatted client-side. So the entire core 6k consists of 60 JSON datasets. I uploaded an archive with all of the files in it, named by course id; if they are processed in this order you should get the complete 6k in proper order. I could also generate an Anki importable (tab seperated) file if that helps, but no more than that; I don't have any knowledge about creating high quality Anki decks. By the way, I assume this thread is okay with the forum rules, since the raw information is available publicly and free of charge, and the provided compilations are original work? Printable core 2000 vocabulary list - PotbellyPig - 2012-08-14 Great work! Would it be possible to place the full 6k into a spreadsheet? Of course I want your pdf as well but with a spreadsheet its easier to manipulate the items and we can can update the Anki deck with the iKnow! order and the new definitions. Printable core 2000 vocabulary list - frony0 - 2012-08-14 Ah, I have little to no experience with ajax or manipulating json responses, so if you could convert that into a database or csv format or similar, that would be great I'm happy to format it all into a deck, although I think Nukemarine has some scripts he should run on the list first to "optimize" it.
Printable core 2000 vocabulary list - Savii - 2012-08-14 The furigana didn't work out after all so I'll keep it at separate kanji/kana for now. I've produced a PDF version (split into six files) and a TSV dump (one big file). TSV is basically CSV with tabs instead of commas; it's importable in Anki and any modern spreadsheet program. Will this be sufficient? I'm not sure what formatting details are optimal for an SRS deck so if anything in the sheet needs changing just say the word. Especially the audio/pictures; right now it's just the original URI's on iknow.jp but I remember something about Anki needing some special setup for media. I'll update my original post with the new download links. frony0 Wrote:I'm happy to format it all into a deck, although I think Nukemarine has some scripts he should run on the list first to "optimize" it.I suppose you're talking about i+1 order? That's pretty nice, but there are advantages to frequency order as well, so I suggest publishing it as an alternative. Printable core 2000 vocabulary list - PotbellyPig - 2012-08-14 Savii Wrote:I'll update my original post with the new download links.Can't wait to check out that .tsv file. I'll let you know how it looks after you post a link for it. Printable core 2000 vocabulary list - frony0 - 2012-08-14 Savii Wrote:The furigana didn't work out after all so I'll keep it at separate kanji/kana for now. I've produced a PDF version (split into six files) and a TSV dump (one big file). TSV is basically CSV with tabs instead of commas; it's importable in Anki and any modern spreadsheet program.Yeah, I saw when I had a fiddle with the JSON files. I'm pretty sure anki can handle remote audio and pictures though, and it's fairly easy to download them too in-app. One big TSV is fine though, it's easy to split them up based on lines, and many people here can work wonders with spreadsheets... As for the order, it's just helpful to have that as an extra index, instead of the same multiple separate spreadsheets situation we have now. Printable core 2000 vocabulary list - PotbellyPig - 2012-08-14 PotbellyPig Wrote:Wow. Great job! I've been using iKnow! for a while now (up to Core 4000 Step 5). I've been maintaining a spreadsheet based on the Anki deck and updating the definitions where applicable by cutting & pasting from the web site. This will save me countless hours of work. Plus, when I'm finished going through all 6000 words via iKnow!, I'll have an Anki deck with the updated definitions. You really saved me from a boatload of work and the time can be used instead for studying.Savii Wrote:I'll update my original post with the new download links.Can't wait to check out that .tsv file. I'll let you know how it looks after you post a link for it. Printable core 2000 vocabulary list - PotbellyPig - 2012-08-15 I just noticed something. The TSV file doesn't have the part of speech for each word (verb, noun, etc.). Can you add it? Printable core 2000 vocabulary list - Savii - 2012-08-16 PotbellyPig Wrote:I just noticed something. The TSV file doesn't have the part of speech for each word (verb, noun, etc.). Can you add it?Sure, I've added that column to the spreadsheet. The TSV download links are updated. Also I'll be updating the documents once more in the next few days. Personally I really like furigana so I gave it another go by finding out how Anki's Japanese Support plugin does the job. I got the method they use (a morphological analyzer tool called MeCab) to work in my script, so if nothing unexpected pops up I should be able to roll out furigana versions soon. I suppose I won't need to add it to the TSV since programs like Anki can add it dynamically anyway? Printable core 2000 vocabulary list - frony0 - 2012-08-16 You need to: Code: awk -F" " '{print $2 "\t" $3 "\t" $4 "\t" $5 "\t[sound:" $6 "]\t" $7 "\t" $8 "\t" $9 "\t[sound:" $10 "]\t<img src=\"" substr($11,0,length($11)-1) "\" />\t" $1}' < iknow_core6k_complete.tsvI've made a quick deck, I'll upload zips/tgzs of the media a little later. Do add the furigana though, at the very least since the mobile client doesn't dynamically generate it like the deskop Printable core 2000 vocabulary list - buonaparte - 2012-08-16 http://users.bestweb.net/~siom/martian_mountain/iKnow6000/ Fore people who prefer parallel texts. The texts generated from iknow_core6k_complete.tsv posted here http://forum.koohii.com/showthread.php?tid=9762 Three Japanese-spaced hiragana-English .doc files with links to off-line audio. iKnow6000 Sentences.doc iKnow6000 Vocab + Sentences.doc iKnow6000 Vocabulary.doc Mp3 files renamed: 0001s.mp3 – 6000s.mp3 – sentences 0001.mp3 – 6000.mp3 – vocabulary. Audio playlists added – sentences only. The original mp3 file names are a complete mess, these here are renamed. It is not an anki deck, the files are meant for listening practice/reviweing mostly - it is easy to check against the written file provided here when in doubt. Printable core 2000 vocabulary list - PotbellyPig - 2012-08-16 Savii Wrote:Thanks for adding it so quickly.PotbellyPig Wrote:I just noticed something. The TSV file doesn't have the part of speech for each word (verb, noun, etc.). Can you add it?Sure, I've added that column to the spreadsheet. The TSV download links are updated. Printable core 2000 vocabulary list - frony0 - 2012-08-16 It appears to no longer be possible to upload decks to the old Ankiweb, so see http://dl.dropbox.com/u/8980026/core.anki for the deck. For the media I have http://dl.dropbox.com/u/8980026/core1.zip for the vocab audio, http://dl.dropbox.com/u/8980026/core2.zip for sentence audio, and http://dl.dropbox.com/u/8980026/core3.zip for (sentence) images. The media files have been renamed to something more "neat"
Printable core 2000 vocabulary list - frony0 - 2012-08-16 I suggest nobody upload the old core decks to ankiweb 2, we can use these instead. Printable core 2000 vocabulary list - Savii - 2012-08-23 MeCab's full automatic furigana generation wasn't as accurate as I'd hoped, so I ended up spending a lot more time on this than I though I'd need. Thanks to some additional generation methods it should be good to go now: 99,5% verified against the original kana transcriptions and I've fixed the remaining ones myself. Both the PDF and TSV files are now updated with these furigana transcriptions. I've also added rōmaji transcriptions to the spreadsheet (for completeness, though I doubt anyone wants them) and made several improvements to the printable lists. To frony0 and buonaparte: great work on those files, I've added them to the first post as well. Printable core 2000 vocabulary list - Savii - 2012-08-25 I discovered a bug in my scripts which caused about 1% of the words to have an incorrect furigana reading. Fixed PDFs and TSV have been uploaded. Printable core 2000 vocabulary list - corry - 2012-11-13 Is the second sentence missing. "I went to the pool." P.S. I was looking at the JSON. Looks like the ready made deck doesnt even bother with half the sentences. Has any one got an up to date deck with all the sentences. |