![]() |
|
Tanuki Corpus: whats the deal? - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: Learning resources (http://forum.koohii.com/forum-9.html) +--- Thread: Tanuki Corpus: whats the deal? (/thread-5164.html) |
Tanuki Corpus: whats the deal? - Asriel - 2010-03-09 So theres this plugin for anki that adds example sentences from the tanaka corpus, but we all know what the downfalls of the tanaka corpus are... so i found the Tanuki Corpus floating aroud here somewhere, but even through google, i can't figure out where it came from, or how reliable it is. I have seen it referred to as "infamous," so i'm just wondering what is up with up with it Tanuki Corpus: whats the deal? - nest0r - 2010-03-09 I can't even remember much about it now, someone named Ken Something posted a thread with links, describing them as made by natives as part of some project or other, but after a couple pages of replies asking for more information, they deleted their entire thread--perhaps from copyright paranoia? They seem to be arranged in such a way similar to KO2001, but obviously, without translations; and they were later modified into new formats from folks who had managed to DL them before the links disappeared. We think they might have been originally made for teaching software. I think we thought that. Can't remember. ;p This was before we had subs2srs, etc., so I sort of lost interest, personally. Tanuki Corpus: whats the deal? - Asriel - 2010-03-09 I think it followed Kanji in Context, IIRC, rather than KO2001. My main question is: is it reliable? Basically what I would like to do is take the words I gather from Rikaichan, through reading things online, and add sentences to them (rikaichan doesn't have these). The obvious idea is to add the sentence from the source itself, which plan to do. However, I would like to have another example in there from a reliable source. Smart.fm is another good choice, provided the person who created the sentence is reliable (ie Cerego, smart.fm) It wouldn't be too hard to write a script that does that for smart.fm, actually. This sounds like a good application for my slowly-growing smart.fm API interface
Tanuki Corpus: whats the deal? - rich_f - 2010-03-09 Tanuki follows the order in Kanji in Context exactly. It makes me wonder if it's somehow related to an electronic version of the project. And that's about as far as I'm going to speculate on that. Other than that, it seems to be rock-solid. I sorted through it, and nothing weird seems to pop up out of it. The sentences seem to be professionally done by education-types, because nothing oddball seems to jump out. You can re-sort it for KO pretty easily with KanjiSort and a good sorting list. It just has an ungodly number of entries in it, so you need to strip out a lot of the data you don't need. No English translations, but you really don't need 'em. Lots of extra sentences, too. About 7150 sentence, total, that cover ~2,000 kanji. It's a great supplement for KiC if you're looking for more sentences. Also good for KO learners if you're done with books 1 & 2, and don't want to mess with Core 2+6k sentences... that's pretty much it. EDIT: my source of choice for reliable sentences is the yahoo.co.jp dictionary, because it's quick and dirty. (It also has 2 和英 dictionaries for more choices of sentences.) Drop it in the searchbar in FireFox with Mycroft, copy and paste into Anki. It doesn't always work, but it's Good Enough. Tanuki Corpus: whats the deal? - Jarvik7 - 2010-03-09 It's the Tanaka corpus that's infamous (for being complete crap). The Tanuki corpus is a lot less well known/used. Tanuki Corpus: whats the deal? - rich_f - 2010-03-09 Yeah, I've used Tanuki on and off as a last-ditch source for good sentences ever since it came out in that weird thread. The sentences are perfectly fine. Tanuki Corpus: whats the deal? - liosama - 2010-03-09 ![]() That was all I could think of, reading the thread title Tanuki Corpus: whats the deal? - Nukemarine - 2010-03-09 Yeah, the history for that list was odd. It's supposedly covered under the common user license or some such. I also have the original text file that I edited to put onto Google Documents. The original had some very useful segments for creating a computer based proficiency test. In other words, it had distractors for definitions, sentence use and pronunciation. In addition, it came with ruby tags to make it all versatile with how you test it. To be honest, with the original file, someone could program an interface and have a damn good app to put on the iPod touch or some flash program. PS: When I did an automated trim of Tanuki corpus against Core 2k/6k and duplicate vocab entries in Tanuki itself, I removed about 3000 entries. Since it was automated, that means some entries were probably unique in meaning just not kana/kanji. Tanuki Corpus: whats the deal? - AKITOD - 2014-08-27 https://ankiweb.net/shared/info/1971975259 Here's my tanuki deck which is just the 7128 deck but with full kanji (may be 1 or 2 mistakes). |