![]() |
|
Printable core 2000 vocabulary list - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: Learning resources (http://forum.koohii.com/forum-9.html) +--- Thread: Printable core 2000 vocabulary list (/thread-9762.html) Pages:
1
2
|
Printable core 2000 vocabulary list - Shinichirou - 2012-11-13 Are there 120 people with good command of the language to create a comprehensive Core6000 pdf? 50 words/person... But with like 3 examples for each word (examples that actually make sense and show the most prominent contexts in which it is used) Now how about that? Anyone approves of this idea? Printable core 2000 vocabulary list - lauri_ranta - 2013-01-03 I updated the Core 6000 text file on my website so it's now based on the TSV file from this thread instead of kore.txt from https://sites.google.com/site/ankinihongo/home/kore. I added columns for furigana for the vocabulary items, RTK keywords, first translations of the vocabulary items, word types like two kanji compound or katakana, word frequency, and sentence difficulty based on the frequency of morphemes. Edit: I have updated the text file on my website again so it's based on the same JSON files as the files in this thread. I copied the furigana for the sentences from Savii's TSV file though. Some errors or inconsistencies in formatting in the original data: About 100 hiragana sentences have two consecutive spaces or spaces around punctuation characters A few fields include a space in the end or two consecutive spaces Error in the translation for the sentence: 218, 747, 1281, 1632, 2580, 3269, 3367, 4511, 4724, 4923, 5405, 5538, 5966 Error in the translation for the vocabulary item: 1015 Wrong reading in the kanji sentence: 5896 Wrong reading in the hiragana sentence: 4582, 4725 Wrong bold part in the kanji sentence: 831, 4165, 4472 Wrong bold part in the hiragana sentence: 187, 3309, 5482, 5598 Full-width characters in the hiragana sentence: 1176, 1611, 3137, 4525 No thousands separator in the hiragana sentence: 878, 1558 No period at the end of the kanji sentence: 1529, 2914 ASCII space in the kanji sentence: 1782, 5748 Actually there are so few errors that it might be better to just not make any changes to the original data. I have also made HTML files for reviewing the Printable core 2000 vocabulary list - cangy - 2013-01-28 lauri_ranta Wrote:I fixed some of them when I was trying to write my own scripts for adding furigana.did that work out? I have one you can use if you like. by the looks of your corrections you are doing it the same way I did... Printable core 2000 vocabulary list - lauri_ranta - 2013-01-28 I just gave up and copied the furigana added by Savii. I'm still interested in hearing how either of you added the furigana though. You probably know about these already, but here's some weird furigana or errors in kore.txt: 最近、銀行[さいきん ぎんこう] 一ヶ月8,000円[いっかげつ はっせんえん] 何?[なに] Printable core 2000 vocabulary list - Savii - 2013-01-28 lauri_ranta Wrote:I just gave up and copied the furigana added by Savii. I'm still interested in hearing how either of you added the furigana though.At first I just tried running everything through MeCab, but obviously that resulted in a large amount of inaccuracies. Then I got this idea of utilizing the fact that there were already hiragana transcriptions of every word and sentence in the core list. The result was a bruteforce-like setup: every sentence was analyzed by MeCab (configured to output the hundred most likely readings), kakasi and one other method (a little script attempting to match the relevant kana from the transcript to the kanji of the original); the first resulting set of readings that matched the transcript exactly was picked as the "correct" reading for that sentence. This approach yielded a result for more than 99% of the 6000 sentences and I did the remaining "problematic" sentences by hand. Your core vocab page with hover furigana grouped by kanji is pretty nice by the way. But I wonder what the word selection is based on; I doubt these are the only words in the core 6k with common kanji connections? Printable core 2000 vocabulary list - ryuudou - 2013-02-08 What are the differences between this new core list data and the older data that's in most of the decks like Nukemarines? I'm noticing that some words that were ranked in the 400s in the old list (首相 for example) are ranked closer to 2000 in this new list. Does anyone know how many changes in the ordering there are, and why they are so drastic? Edit: Nevermind. http://support.iknow.jp/entries/21668746-How-is-the-current-Japanese-Core-different-from-the-old-series- I wonder if there's an easy way to reorder older decks like Nukemarines in Anki. Printable core 2000 vocabulary list - lauri_ranta - 2013-02-09 ryuudou Wrote:What are the differences between this new core list data and the older data that's in most of the decks like Nukemarines?Here is a graph for the positions of the words on a word frequency list: ![]() The TSV file from this thread is red and kore.txt from cangy's website is green. In the new version, there seem to be three groups of words that are sorted by frequency, starting after about 2500, 4500, and 5500. Another graph for word positions in the TSV file from this thread (X-axis) and kore.txt (Y-axis): ![]() Fields that were not identical: - 166 vocabulary items (shown as -1000 in the second graph) - 501 sentences - 2472 translations for vocabulary items (or 1317 if only the part before the first comma is considered) - 2597 translations for sentences Examples of edited fields, where the new version is on the second line: 大晦日 大みそか 紙屑 紙くず IT業界は女性の比率が低い。 IT業界は女性の比率が低い。 遅れてご免。 遅れてごめん。 year by year, annually year by year torment, trouble torment (someone), trouble (someone) What that kid said is a made-up story. That kid is telling a made-up story. Fortunately, I got a ticket. Fortunately, I was able to get a ticket. The biggest change in my opinion is that almost half of the English translations for both words and sentences have been edited, but I wouldn't still care which version I use. Edit: I just realized that the some of the changes in the TSV file were actually made by Savii. When I compared the TSV file with the original JSON files, 12 vocabulary items, 55 sentences, 442 translations for vocabulary items, and 132 translations for sentences were not identical. Many of the original translations are fairly awkward compared to the edited ones. I uploaded a list of the changes to http://19a5b0.s3-website-us-west-2.amazonaws.com/printable-core-differences.txt. Printable core 2000 vocabulary list - ryuudou - 2013-02-09 lauri_ranta Wrote:I think most of the changes are concentrated in Core 2000. It almost appears to be completely different. Just glancing through I see that a lot of words have a new place of over 1000 higher or lower.ryuudou Wrote:What are the differences between this new core list data and the older data that's in most of the decks like Nukemarines?I compared kore.txt and the TSV file from this thread, and all except a few words had different positions. About 3% of the words were replaced with different words. Out of the shared words, the translations for about 40% were different. Printable core 2000 vocabulary list - toshiromiballza - 2013-02-19 Any chance one of you (Savii, lauri_ranta, cangy) would release your "furiganizer" scripts to the public? I'm not aware of any software that does this in bulk automatically, except for an ancient program called JGloss which is really buggy for me and produces terrible (when it does work) results (picks the first entry from EDICT, instead of for example going with those marked as common first). If output was possible in HTML5 ruby tag, so much better. Printable core 2000 vocabulary list - cangy - 2013-02-26 Savii Wrote:At first I just tried running everything through MeCab, but obviously that resulted in a large amount of inaccuracies. Then I got this idea of utilizing the fact that there were already hiragana transcriptions of every word and sentence in the core list. The result was a bruteforce-like setup: every sentence was analyzed by MeCab (configured to output the hundred most likely readings), kakasi and one other method (a little script attempting to match the relevant kana from the transcript to the kanji of the original); the first resulting set of readings that matched the transcript exactly was picked as the "correct" reading for that sentence. This approach yielded a result for more than 99% of the 6000 sentences and I did the remaining "problematic" sentences by hand.I did it the other way around, first matching the kana with the kanji which worked 99% of the time, then looking up a dictionary file to resolve ambiguities... lauri_ranta Wrote:I just gave up and copied the furigana added by Savii. I'm still interested in hearing how either of you added the furigana though.cranki compares the reading field with the mixed kanji/kana expression, so I had to correct a bunch of errors in the smart.fm data so they'd match up. I think there was extra logic to handle all the missing punctuation in the kana field just for smart.fm, which might explain that odd furigana... toshiromiballza Wrote:Any chance one of you (Savii, lauri_ranta, cangy) would release your "furiganizer" scripts to the public? I'm not aware of any software that does this in bulk automatically, except for an ancient program called JGloss which is really buggy for me and produces terrible (when it does work) results (picks the first entry from EDICT, instead of for example going with those marked as common first). If output was possible in HTML5 ruby tag, so much better.it's available again now: https://sites.google.com/site/ankinihongo/ but it's pretty nasty... Printable core 2000 vocabulary list - vebaev - 2014-01-10 Did someone know how to batch delete the english from PDF with furigana (6 documents, 1k each, in the first post), but to remain the PDF as it is without losing the nice formatting, so I can use the papers for practice and not distract at the translations? Printable core 2000 vocabulary list - Savii - 2014-01-10 vebaev Wrote:Did someone know how to batch delete the english from PDF with furigana (6 documents, 1k each, in the first post), but to remain the PDF as it is without losing the nice formatting, so I can use the papers for practice and not distract at the translations?How? Well, the easiest way would be getting the creator to dig up his templates and scripts from some dusty corner and tweak them a bit to generate builds based on your desired setup ![]() Here you go: https://www.mediafire.com/?i28xq5m5enoso6y I've modified the structure a bit to actually utilize the extra space, there are 15 items per page now and the horizontal spacing has been reduced. So no trees dying unnecessarily! Printable core 2000 vocabulary list - vebaev - 2014-01-10 ああ!あなたは私の神ですよ。 PS Hope it is correct
Printable core 2000 vocabulary list - SammyB - 2014-01-11 ^ grammatically correct I suppose but not something I would say (in English either for that matter). ![]() "You are my God..." O_O Anyway, I'm pretty sure a simple 助かりました!would sound more natural in this situation. :p Printable core 2000 vocabulary list - Vostranoid - 2014-01-16 ok edit for more clarity: is there, or is it possible to make a pdf that looks as good as this one but with the order present in the optimized core deck?
Printable core 2000 vocabulary list - Inny Jan - 2014-01-16 Vostranoid Wrote:ok edit for more clarity:As you asked "is it possible" I will answer, "yes, it is certainly possible". Happy? Printable core 2000 vocabulary list - Vostranoid - 2014-01-17 Inny Jan Wrote:As you asked "is it possible" I will answer, "yes, it is certainly possible". Happy?wow aren't we charming. Printable core 2000 vocabulary list - r3ftch - 2015-04-24 Thanks for the 2015 iKnow! core list Savii.
Printable core 2000 vocabulary list - aahmyu - 2015-05-18 Hello Thanks for the 2015 iknow list but ive been trying to make an anki list from it for hours and its not working im new to anki stuff and i tried the core 2000 with words and sentences and figured out i can make this new list like ![]() anyone can create a deck for it or tell me how to do it? thanks again Printable core 2000 vocabulary list - rainmaninjapan - 2015-09-30 The original guy asking for it probably won't see it, but if anyone wants it here is a deck made from the 2015 iKnow spreadsheet (includes audio and images ofc). https://mega.nz/#!okMQ1RIR!iFexLEsb8DVhoNUHYApSowRQnUJsDWiOJ0uWPSvydvg I formatted it to have the word, sentence, and image on the front, and kana reading, translation, and audio on the back. If you want to format it differently learn how to edit cards, it's really easy. Took me hours to properly make a deck though (first time I've used Anki). Printable core 2000 vocabulary list - vebaev - 2015-10-03 Hi, What 2015 iKnow version is different from the old ones? And from 2k6k decks? Printable core 2000 vocabulary list - rainmaninjapan - 2015-10-04 The deck I made from 2015 is slightly different than the "optimized core2k6k". From what I can see the only difference is that some of the English translations are slightly different, and some sentences in Japanese are different. There's also one more sentence for あげる showing a 5th meaning of it (there's like 10 or something) so there are 6000 cards instead of 5999. Also it lacks furigana readings, just the full sentence in hiragana. There may be other changes and corrections. I figured it likely that the most recent version is the most correct, so decided to use that. Edit: From what I'm seeing now there are a few mistakes in the 2015 version which are corrected in some of the 10k decks that are floating around. For instance, 剥げる (hageru) using an extremely rare non jouyou kanji 剝げる (core 10k uses the correct one). |