Joined: Feb 2008
Posts: 1,322
Thanks:
0
So theres this plugin for anki that adds example sentences from the tanaka corpus, but we all know what the downfalls of the tanaka corpus are...
so i found the Tanuki Corpus floating aroud here somewhere, but even through google, i can't figure out where it came from, or how reliable it is. I have seen it referred to as "infamous," so i'm just wondering what is up with up with it
Joined: Oct 2007
Posts: 4,582
Thanks:
0
I can't even remember much about it now, someone named Ken Something posted a thread with links, describing them as made by natives as part of some project or other, but after a couple pages of replies asking for more information, they deleted their entire thread--perhaps from copyright paranoia? They seem to be arranged in such a way similar to KO2001, but obviously, without translations; and they were later modified into new formats from folks who had managed to DL them before the links disappeared. We think they might have been originally made for teaching software. I think we thought that. Can't remember. ;p This was before we had subs2srs, etc., so I sort of lost interest, personally.
Joined: Jul 2007
Posts: 1,879
Thanks:
19
Tanuki follows the order in Kanji in Context exactly. It makes me wonder if it's somehow related to an electronic version of the project. And that's about as far as I'm going to speculate on that.
Other than that, it seems to be rock-solid. I sorted through it, and nothing weird seems to pop up out of it. The sentences seem to be professionally done by education-types, because nothing oddball seems to jump out.
You can re-sort it for KO pretty easily with KanjiSort and a good sorting list. It just has an ungodly number of entries in it, so you need to strip out a lot of the data you don't need. No English translations, but you really don't need 'em. Lots of extra sentences, too. About 7150 sentence, total, that cover ~2,000 kanji.
It's a great supplement for KiC if you're looking for more sentences. Also good for KO learners if you're done with books 1 & 2, and don't want to mess with Core 2+6k sentences... that's pretty much it.
EDIT: my source of choice for reliable sentences is the yahoo.co.jp dictionary, because it's quick and dirty. (It also has 2 和英 dictionaries for more choices of sentences.) Drop it in the searchbar in FireFox with Mycroft, copy and paste into Anki. It doesn't always work, but it's Good Enough.
Edited: 2010-03-09, 3:04 am
Joined: Mar 2007
Posts: 3,851
Thanks:
0
It's the Tanaka corpus that's infamous (for being complete crap). The Tanuki corpus is a lot less well known/used.
Joined: Jul 2007
Posts: 1,879
Thanks:
19
Yeah, I've used Tanuki on and off as a last-ditch source for good sentences ever since it came out in that weird thread. The sentences are perfectly fine.
Joined: Jul 2007
Posts: 2,313
Thanks:
22
Yeah, the history for that list was odd. It's supposedly covered under the common user license or some such. I also have the original text file that I edited to put onto Google Documents.
The original had some very useful segments for creating a computer based proficiency test. In other words, it had distractors for definitions, sentence use and pronunciation. In addition, it came with ruby tags to make it all versatile with how you test it.
To be honest, with the original file, someone could program an interface and have a damn good app to put on the iPod touch or some flash program.
PS: When I did an automated trim of Tanuki corpus against Core 2k/6k and duplicate vocab entries in Tanuki itself, I removed about 3000 entries. Since it was automated, that means some entries were probably unique in meaning just not kana/kanji.