Reply #1 - 2008 May 30, 10:57 am
mr_hans_moleman Member
From: Toronto Registered: 2007-06-24 Posts: 179

Seems like the thread has been deleted. Has there been any problems?

Reply #2 - 2008 May 30, 1:04 pm
Dragg Member
From: Sacramento, California Registered: 2007-09-21 Posts: 369

Very strange and with no notice either.  Maybe there was a complaint or, at least, realization of copyright infringement.  If that is the case, the thread had several posts with download links so it was probably just easier to delete the whole thing.  Too bad... but hopefully we find out specifics.  If the sentences came from a commercial resource, I might be interested in purchasing it.

Reply #3 - 2008 May 30, 3:45 pm
Dragg Member
From: Sacramento, California Registered: 2007-09-21 Posts: 369

I don't know if the original poster can delete his whole thread.  I don't see an option for it, but maybe by intending only to delete the first post, all the following posts still get automatically deleted as well.  In any case, all of kentank's posts from both threads (one he created and the other was about sentence mining collections) have been removed.  It doesn't seem like something Fabrice would do; he is usually very good about communicating if there is ever a problem in instances like this.

...but if this thread gets deleted too, I guess we'll have our answer. lol.

Last edited by Dragg (2008 May 30, 3:47 pm)

Advertising (register and sign in to hide this)
JapanesePod101 Sponsor
 
Reply #4 - 2008 May 30, 6:48 pm
ファブリス Administrator
From: Belgium Registered: 2006-06-14 Posts: 4021 Website

Yes this is strange. I confirm that I did not delete the topic. I didn't realize that it was so easy for the original poster to delete the entire topic even days after having first posted. Unfortunately I just checked and I didn't find an option in the PunBB Admin that prevents a poster to delete the topic they started when delting the first post (this kind of thing should really be left to the moderator/admin).. so I'm also disappointed that the helpful posts and activity around the sentence collection were lost.

On the other hand perhaps this was indeed a copyright problem. I thought for one person to create such a detailed list (with all the readings, markers etc), it would have been a lot of work. Mr Tanuki has been quite silent so I 'm afraid we probably won't hear much about it unless this happened someplace else too? Did anyone match the sentences to a known copyrighted source?

Reply #5 - 2008 May 30, 7:54 pm
Delina Member
From: US Registered: 2008-02-12 Posts: 102

Is this the same as the Tanaka Corpus described here:

http://www.csse.monash.edu.au/~jwb/tanakacorpus.html

If so, you can (carefully, after having read the disclaimers on the above site) search these sentences from www.mahou.org/Dict/ or download the whole thing from the site above. Tanaka-sensei released it into the public domain.

Basically these sentences were part of an ongoing assignment for university students learning Japanese. They are often direct translations of English sentences, hence extensive use of pronouns. Many of them also have mistakes, especially in the kanji readings.

Personally I won't be using them for my SRS, but I do use them for examples of grammar structures when I need help
with my homework.

Reply #6 - 2008 May 30, 8:01 pm
Mcjon01 Member
From: 大阪 Registered: 2007-04-09 Posts: 551

Nope.  It was different, thus the different name. ^_^

It was apparently written by native Japanese speakers, or perhaps just one.  The details of its creation are sketchy at best, and that's being generous.

Reply #7 - 2008 May 31, 1:12 am
roderik Member
From: The Netherlands Registered: 2008-04-04 Posts: 98

I'd like to add: if someone still has this collection of superbly written sentences, would he or she please be so kind as to upload it and thus 'spread the love'?

Reply #8 - 2008 June 01, 9:29 am
zushi New member
From: California Registered: 2008-05-30 Posts: 1

Wow, those are good sentences.  Thanks for the non-linked link.

Reply #9 - 2008 June 01, 2:11 pm
ファブリス Administrator
From: Belgium Registered: 2006-06-14 Posts: 4021 Website

But we still don't know if it did in fact come from an unauthorized source, right ?

There was a "Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License" text file in the original archive, but no mention of the source of the data, other than "Kensuke Tanaka".

Reply #10 - 2008 June 01, 2:54 pm
playadom Member
Registered: 2007-06-29 Posts: 468

That tanukicorpusmod file was scarily easy to find.

Reply #11 - 2008 June 10, 8:42 am
PrettyKitty Member
From: USA Registered: 2007-07-02 Posts: 178

The spreadsheet won't display in Japanese characters in Excel.
Does anyone know how to fix that?

Reply #12 - 2008 June 10, 9:19 am
Delina Member
From: US Registered: 2008-02-12 Posts: 102

I had to open it with a more recent version of Excel - I couldn't see the characters in Excel 2000. I changed the .csv tag to .txt and imported it into Excel 2007. From there I saved it as an Excel 2000-2003 file and it now opens fine.

E-mail me and I can just send you my file.

Reply #13 - 2008 June 10, 9:28 am
PrettyKitty Member
From: USA Registered: 2007-07-02 Posts: 178

That worked. Thanks!

Reply #14 - 2008 June 15, 10:58 am
skylarth Member
From: USA Registered: 2007-08-08 Posts: 49

So for now I will probably just assume that the file is good until we hear otherwise...

In trying out the anki file in tanukicorpusmod, i find that it takes a really long time for anki to respond to my clicks. Has this happened to anyone else?

Reply #15 - 2008 June 15, 12:45 pm
resolve Member
From: 山口 Registered: 2007-05-29 Posts: 919 Website

It's quite fast here, and the new backend should definitely scale to 7000 cards.

Reply #16 - 2008 June 15, 12:52 pm
skylarth Member
From: USA Registered: 2007-08-08 Posts: 49

Hm. Maybe it is just my computer. Resolve, is there some sort of indexing that goes on the first time you open a big deck? My computer is usually fast enough with Anki that I have never thought about it..

Reply #17 - 2008 June 15, 1:31 pm
resolve Member
From: 山口 Registered: 2007-05-29 Posts: 919 Website

Taking a while to open the deck is usual. But per-answer time should be low.

Reply #18 - 2008 June 15, 1:48 pm
skylarth Member
From: USA Registered: 2007-08-08 Posts: 49

Well, it seems to have sped up somewhat. There is now a noticeable delay, but it is pretty fast. It probably is my computer.

Thanks resolve!

Reply #19 - 2009 April 20, 2:39 am
Nukemarine Member
From: 神奈川 Registered: 2007-07-15 Posts: 2347

Here's the link to the Tanuki file that I simplified and put into Google Documents.

http://spreadsheets.google.com/ccc?key= … k-1U1AWNMA

Out of interest, somebody want to try the following:

1. Create TTS audio of the Sentences
2. Upload all information into Smart.fm as 20 Japanese to Japanese decks
3. ???????
4. Profit

For those that like Japanese definitions, this list is hard to beat.

Reply #20 - 2009 April 20, 10:08 am
mafried Member
Registered: 2006-06-24 Posts: 766

I can do the TTS audio pretty quick... I'll just adapt the python script I used for JSPfEC.  Do you want just the sentences?  Or vocab+definition+sentence?

Reply #21 - 2009 April 20, 12:42 pm
Matthias Member
From: Germany Registered: 2005-10-27 Posts: 37

Nukemarine: I have seen a few times that you put a lot of hope onto the Tanuki set. Also in comparison to the iKnow sentences, which are criticized that the order is not "systematic" and that they are hence too "difficult".

Tanuki's kanji order might indeed be more systematic. I have seen here an analysis comparing it to the order of either KIC or KO (statistic made by the usual suspect). But Tanuki is not organized from easy / frequent vocabulary to difficult / rare vocabulary.

I do have an active set of 406 random Tanuki sentences with about half of them out of the beginners section (first 1353 sentences or 249 kanji). Well to make it short, I like them a lot, because to me they are more interesting than the iKnow sentences (which were called here "uninspiring").

But if you compare the difficulty, then iKnow seems to me a lot easier because it contains straightforward sentences with frequently used vocabulary. And iKnow's ordering by frequency of vocabulary is a valid ordering criterion too.

Go ahead with Tanuki (perhaps you can add besides audio also picture material). But Tanuki is not per se easier, more systematic and better than iKnow.

Reply #22 - 2009 April 20, 1:26 pm
mafried Member
Registered: 2006-06-24 Posts: 766

Ask and ye shall receive.

Tanuki.zip

Now if only we could get stoked to do a one-sentence, one-picture thread... wink

Reply #23 - 2009 April 20, 6:53 pm
rich_f Member
From: north carolina Registered: 2007-07-12 Posts: 1708

Tanuki follows almost exactly the Kanji in Context Order. (I think it only misses one of them.) I have both the Tanuki file and both volumes of KiC, and after checking both last year, I only found one discrepancy.

Reply #24 - 2009 April 21, 1:22 am
Nukemarine Member
From: 神奈川 Registered: 2007-07-15 Posts: 2347

Matthias, perhaps you misunderstand me. If these sentences are put into smart.fm, then they can be organized into a more useful format (ie KO2k1 order). Any application that utilizes smart.fm then has these items to draw from. It becomes yet another resource any of us can exploit based on our needs.

I'll be using the smart.fm core series myself. However, I'm a big believer in getting a variety of resources out there.

Ultimately, the best amalgamation I think will be: smart.fm word list, ko2k1 order, tanuki word definitions (in Japanese), native audio (currently limited to smart.fm's material), photos for the sentences. Now, by the time something like this comes up, I'll be beyond it. However, it will be there for those that follow us making their job easier, no?

Even then, I could still benefit: Take my anki deck (smart.fm list), do a smart switch where the smart.fm words are matched to the tanuki list so Japanese definitions match up to Core 2k, 6k words; add block in anki for Japanese definitions, display in answer field.

I did something similar with the Kanken list that was posted. I put all the definitions and yomi in my Anki RevTK file and had it display those too. I didn't start from scratch, but from here on out I get more out of my reviews of Kanji.

Last edited by Nukemarine (2009 April 21, 1:29 am)

Reply #25 - 2009 April 21, 1:25 am
Nukemarine Member
From: 神奈川 Registered: 2007-07-15 Posts: 2347

Mafried,

The TTS is the easy part. I'm wondering about the ability to bulk upload these things to smart.fm.