Joined: May 2014
Posts: 2
Thanks:
0
Hi,
I found out about coreXk about three weeks ago. It's amazing.
But it seems to be only available as anki deck or spread through several places.
So is there a major source or a main database somewhere, that contains an maintains all the images, audio files and words together?
Greets
Joined: Dec 2013
Posts: 158
Thanks:
3
I don't know about the images and audio files, but I believe Nukemarine's thread has an excel spreadsheet to all the words at the top
Although I'm not really sure why you'd care about finding the images and audio files separately.
Joined: Dec 2011
Posts: 116
Thanks:
4
They come from iKnow.co.jp/Smart.fm/iKnow.jp back when they licensed the official material under a CC license.
Joined: May 2014
Posts: 2
Thanks:
0
(edit: thanks for the answers! Helpful!)
As I found a a number of sentences with mistakes or unnatural Japanese, it would be nice, if a community could fix things immediately and not having a static pack of everything.
Is there any plan to make it possible to generate the core files out of any bigger/accessible database? (like using the order of core with a database like tatoeba.org)
greetings
Edited: 2014-05-24, 4:45 am
Joined: Jul 2007
Posts: 1,879
Thanks:
19
You're going to find all kinds of unnatural Japanese on Tatoeba, as well as in EDICT's sample sentences, because both rely on the Tanaka Corpus, which is notorious for containing a lot of weird/unnatural stuff.
You could just take the word list from Core Xk, then run it through EPWING2Anki to get as many sentences as you want. It worked pretty well for me. The downside is that you lose the pictures and the sound, but it's quick and dirty.
Joined: Mar 2013
Posts: 187
Thanks:
2
Does core2k/6k have any/a lot of unnatural Japanese?
Joined: Mar 2013
Posts: 187
Thanks:
2
What would you say high-intermediate to advanced is? I'm at a little over 3000 words in core2k/6k.
Edited: 2014-05-25, 11:28 am
Joined: Oct 2009
Posts: 3,944
Thanks:
11
One thing to remember is that in Japanese, as with any language, you can't just take a sentence and label it "natural" or "unnatural." Sometimes it depends on the context, and there's a lot of regional and individual variation among native speakers so that some speakers might accept a sentence as natural while others don't. Also when you're specifically asked to identify a sentence as natural or unnatural, you might think it's unnatural when you may not have even noticed if a native speaker said it.
Furthermore, if a sentence is unnatural in 2014 but would have been natural in 1982 (or 1955), that doesn't necessarily mean the sentence is bad depending on exactly how the sentence is presented and what the purpose of studying it is.
Joined: Sep 2011
Posts: 35
Thanks:
0
Curious about the source of the Core sentences, and wondering what the Tanaka Corpus is, I came up with this:
The Tanaka Corpus (212,00 sentence pairs) was published in 2001. It was compiled by Japanese university students as class assignments by Prof. Tanaka. The material is riddled with errors of all kinds, ranging from transcription errors to "pairs" of completely unrelated sentences.
Jim Breen edited it for duplications, spelling errors, and mismatched sentence "pairs", and incorporated the resulting 180,000 sentence pairs into JDIC as example sentences in 2003. The editing is ongoing and is now down to about 150,000 sentence pairs.
In 2006 the Corpus was incorporated into the Tatoeba project, where it is being further edited and corrected online by volunteers of varying skill levels.
----------
The previous posts in this thread seem to imply that some part (10,000 sentence pairs) of the Corpus was incorporated into the iKnow.com websites, and the Core series was eventually extracted from that. Does anybody know if that's true, and if so, how the sentences pairs were selected and if they were they edited or corrected in any way?
Joined: Sep 2011
Posts: 35
Thanks:
0
lauri_ranta and Thora, thank you for the clarification, and the interesting stories behind the Core decks and iKnow. I do like to know the source of the materials I'm using.