Back

What are your thoughts on Tatoeba.org?

#1
I came across Tatoeba.org around 2010. I remember thinking it was neat, but was concerned about its accuracy. I haven't thought about the site for a few years until I stumbled across it today.

What do you guys think of it?
Reply
#2
Yeah, it's neat, but with respect to Japanese, it just isn't growing very much. Half of the hits that you get on a query seem like they come from the Tanaka corpus (the same set of example sentences used by WWWJDIC and many free dictionaries). If I'm searching somewhere outside of my free dictionary app, I'm looking for -different- examples.

Everything in it has furigana, but that furigana seems like it's automatically generated. When trying to pin down the differences between 入る、開く、 開ける (いる・はいる、あく・ひらく、あける・ひらける) and the like, I thought it would be a guide for me, only to later find out that I had mis-learned some usages. So one thing that seemed really valuable (furigana for all examples) is actually worthless, you'll get -better- information from hovering non-furigana examples with rikai(chan/sama/kun), because then you'll actually see all the possible kana spellings and not just the one possibly mistaken spelling the algorithm chose.

I -do- use the site sometimes when I look for example sentences, but it's my last resort after http://dic.yahoo.co.jp, http://www.alc.co.jp, and http://www.weblio.jp. In general I try to avoid the Tanaka corpus itself, because it's full of errors. I have no idea whether or not those errors are being corrected by tatoeba participants.
Edited: 2015-11-21, 10:20 pm
Reply
#3
I was using it in the beginning but I asked the same question as you and was steered away from it completely (thankfully early in my studies), due to being told of it having errors. I didn't want to risk creating bad habits if I was learning a sentence that was wrong. I think you can filter it to show the sentences that are only added by native speakers, but again I'm not sure of the reliability in that either.

I would stick with the places Chris linked, as they are what I use too. Smile
Reply
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions! - Sign up here
JapanesePod101
#4
Same.  I hardly ever touch it now.

While you have to get to a certain level to use it effectively, I've found Kenkyuusha to be one of the best resources for learning usages.  I'm especially fond of the way it distinguishes between different meanings.
Reply
#5
@Somecallmechris has some good suggestions, too. I've used all of those.

You could use newspaper websites. Google the words you're trying to find, and use the SITE: option in Google search to limit it to one website. For example, put this in your google search bar:
北朝鮮 SITE: shasetsu.ps.land.to

That will get you a bunch of newspaper editorial columns about North Korea, full of sentences that use the word correctly. But they're going to be hard to read, unless you're at N2 or N1 level.

Or you could go EPWING. Get EBWin or another EPWING file reader, then get some EPWING dictionaries. You can get EIJIRO/WAEIJIRO here: http://www.japaneselanguagetools.com/busDL.html

EPWING2Anki is a great program if you want to dump a bunch of sentences into Anki from an EPWING dictionary + word list. (It won't work with EIJIRO, though, it will work with other dictionaries.) The problem is finding the dictionaries. You might be able to find some download versions on the Vector download shop, here:
http://www.vector.co.jp/magazine/softnews/?tab
Reply
#6
(2015-11-21, 10:16 pm)SomeCallMeChris Wrote: In general I try to avoid the Tanaka corpus itself, because it's full of errors. I have no idea whether or not those errors are being corrected by tatoeba participants.

<jake`> lool0 tried to remove the terrible jap stuff from tatoeba
<jake`> aka the tanaka corpus
<jake`> but he failed in his quest
<jake`> like everything else he does
<lool0> ;_;
<lool0> someone else did mot of the work though
<lool0> *most
<jake`> what work
<lool0> so now I need to just introduce magic to hide the rest
<jake`> since when did you do any tatoebaing
<jake`> oh
<jake`> its actually been removed?
<jake`> when did that happen
<lool0> no some very motivated japanese person sorted through almost entire corpus
<lool0> +the
<lool0> and rated it
<lool0> but it's sitting in some table not being used
<jake`> so when will it be used
<lool0> idk
Reply
#7
(2015-11-22, 4:17 pm)jakep Wrote:
(2015-11-21, 10:16 pm)SomeCallMeChris Wrote: In general I try to avoid the Tanaka corpus itself, because it's full of errors. I have no idea whether or not those errors are being corrected by tatoeba participants.

<jake`> lool0 tried to remove the terrible jap stuff from tatoeba
<jake`> aka the tanaka corpus
<jake`> but he failed in his quest
<jake`> like everything else he does
<lool0> ;_;
<lool0> someone else did mot of the work though
<lool0> *most
<jake`> what work
<lool0> so now I need to just introduce magic to hide the rest
<jake`> since when did you do any tatoebaing
<jake`> oh
<jake`> its actually been removed?
<jake`> when did that happen
<lool0> no some very motivated japanese person sorted through almost entire corpus
<lool0> +the
<lool0> and rated it
<lool0> but it's sitting in some table not being used
<jake`> so when will it be used
<lool0> idk

+1 agree with Chris. I find that site has too many mistakes.
Example: the furigana for 9時 (くじ) has きゅうじ instead.

For data programmers, maybe "word-to-word translation" sentences are useful for them.  But it does not give appropriate meaning or explain the nuance behind it, which can be misleading and cause misunderstanding.

「英語生活マナーブック」から一節を抜粋する:
日本的な考え方では、頼みごとをするときに「・・・してもらえませんか」”Won't you・・・?””Couldn't you・・・?”のように否定形を使うほうがより丁寧であるとみなされています。けれども、それは英語には当てはまりません。英語では、否定形を使った依頼は押しつけがましいかあるいは懇願しているかのように響きます。まるで「私の頼みを断る理由はないはずだ」と言っているか、あるいは「私のために何かをするのは当然のことだ」と言っているかのように聞こえるのです。これでは意図に反したニュアンスが伝わってしまいます。
Reply
#8
I think its good because there are a lot of example sentences provided.
but I am not so sure about the accuracy
Reply
#9
I seldom use it because I doubt the accuracy of the examples
Reply
#10
(2015-12-03, 4:09 am)laurenk Wrote: I seldom use it because I doubt the accuracy of the examples

But it is so simple to solve this problem. Just get the examples you think are good enough for you and post it on lang-8. A native will correct it if necessary and then you have sentences to use on Anki.
Reply
#11
(2015-12-12, 7:21 pm)jahnke Wrote:
(2015-12-03, 4:09 am)laurenk Wrote: I seldom use it because I doubt the accuracy of the examples

But it is so simple to solve this problem. Just get the examples you think are good enough for you and post it on lang-8. A native will correct it if necessary and then you have sentences to use on Anki.

You could do this occasionally but this is not what I would consider "fair use" of Lang-8. The site is about correcting your production, not checking a corpus of sentences like Tanaka's. One further disadvantage is the lack of feedback to the original database, which means that the same sentences will need to be corrected again and again.
Reply
#12
You could try 知恵袋.
Reply
#13
I am using the "Japanese example sentences" plugin in Anki, which is bases on some version of the Tanaka corpus. I have noted that some sentences are displayed as "checked". Is this from Tatoeba or from some other source?
The ratio of "checked" to "unchecked" sentences is rather low but one might expect it to increase with time.
I have just reloaded the database today from
http://ftp.ftp.monash.edu.au/pub/nihongo...les.utf.gz
and there does not seem to be much improvement, though.
Reply