large sentence collections

Index » General discussion

  • 1
 
spinoza99 Member
Registered: 2008-07-28 Posts: 10

hi everyone

given that a lot of people here are searching for sentences i decided to google "parallel corpora" and - in a roundabout way - found two interesting sites. my apologies if they've been posted here before!!

the first webpage is

http://www2.nict.go.jp/x/x161/members/m … -pages.htm

click on the literature title you're interested in and then click on the link 対訳データ

you then get the title you selected but all the english sentences/phrases are edited and aligned with their relevant japanese sentence/phrases. for example, i chose "the great gatsby". here are the first two lines:

ぼくが今より若くて今より傷つきやすかった時代に父から受けた一種の忠告を、ぼくは何度も心の中で繰りかえしながら生きてきた。
In my younger and more vulnerable years my father gave me some advice that I've been turning over in my mind ever since.

「他人のことをとやかく言いたくなったときはいつでもね、この世の誰もがおまえほどに恵まれた生き方をしてるわけじゃないと思い出すことだ」
Whenever you feel like criticizing any one, he told me, just remember that all the people in this world haven't had the advantages that you've had.

i saved the webpage as a .txt file and after a couple of hours editing (jumping between WORD and EXCEL) i'd got "the great gatsby" novel in a tab delimited file. that's 5000 sentences!!

the second webpage is this:

http://www2.nict.go.jp/x/x161/members/m … index.html

there are two compressed text files which between them contain around 70.000 (!!) sentences from REUTERS. again, the good news is that the english and japanese sentences are already in a format which lends itself to easy editing. for example:

<T SCORE=0.957704654895666>
<J>★パキスタン、独立50周年を祝う。</J>
<E>* Pakistan celebrated the 50th anniversary of its independence.</E>
</T>
<T SCORE=0.918151147098519>
<J>政府の構造や組織を変更すべきだ。</J>
<E>The structure and composition of the government must be changed.</E>
</T>
<T SCORE=0.872736137907894>
<J>ドイツ石油企業のあるシニア・トレーダーは、「すべて売却された」と述べた。</J>
<E>"Everything was sold," said a senior trader for a German oil company.</E>

obviously these two webpages yield more sentences that you need, but even if you pull out just a couple of hundred sentences ...

hope this helps!!

keep up the good work everyone

Last edited by spinoza99 (2008 September 29, 7:40 am)

wccrawford Member
From: FL US Registered: 2008-03-28 Posts: 1551

Not trying to discourage you, but when this kind of thing has been discussed before, the end result is that translated works rarely sound natural to a native speaker.  Things are said oddly and it's not a great way to practice.  Worse, you don't really know which things sound odd and which don't.

If you found the opposite, where Japanese texts were translated to English line by line, that would be a lot more helpful to your studies.

snispilbor Member
From: Ohio USA Registered: 2008-03-23 Posts: 150 Website

While wccrawford is right about E->J translations, I'll add this much:  sometimes you need to find sentences for a specific, obscure word, and sometimes not even SpaceALC has many examples for that word.  If these texts are searchable, it could be great for finding more sentences for a particular given word!

Advertising (register and sign in to hide this)
JapanesePod101 Sponsor
 
CaLeDee Member
Registered: 2008-08-31 Posts: 170

wccrawford wrote:

Not trying to discourage you, but when this kind of thing has been discussed before, the end result is that translated works rarely sound natural to a native speaker.  Things are said oddly and it's not a great way to practice.  Worse, you don't really know which things sound odd and which don't.

If you found the opposite, where Japanese texts were translated to English line by line, that would be a lot more helpful to your studies.

Doesn't that depend on the skill and style of translation the translator is capable of? I'm sure a sufficiently skilled translator can sound natural in both English and Japanese. I know it's hard to translate and get the exact meaning across without resorting to using words that simply aren't even used in the original text. I doubt all E->J translations sound unnatural though. Really depends on the translator, no?

vgambit Member
Registered: 2007-06-21 Posts: 221

wccrawford wrote:

Not trying to discourage you, but when this kind of thing has been discussed before, the end result is that translated works rarely sound natural to a native speaker.  Things are said oddly and it's not a great way to practice.  Worse, you don't really know which things sound odd and which don't.

If you found the opposite, where Japanese texts were translated to English line by line, that would be a lot more helpful to your studies.

Well, if the National Institute of Information and Communications Technology can't get a translation right, who can?

Tobberoth Member
From: Sweden Registered: 2008-08-25 Posts: 3364

CaLeDee wrote:

wccrawford wrote:

Not trying to discourage you, but when this kind of thing has been discussed before, the end result is that translated works rarely sound natural to a native speaker.  Things are said oddly and it's not a great way to practice.  Worse, you don't really know which things sound odd and which don't.

If you found the opposite, where Japanese texts were translated to English line by line, that would be a lot more helpful to your studies.

Doesn't that depend on the skill and style of translation the translator is capable of? I'm sure a sufficiently skilled translator can sound natural in both English and Japanese. I know it's hard to translate and get the exact meaning across without resorting to using words that simply aren't even used in the original text. I doubt all E->J translations sound unnatural though. Really depends on the translator, no?

From what we've seen, it really isn't a question about skill most of the time, the translators purposely want to make sound western-styled, seems Japanese people prefer to read translated works like that.

  • 1