Back

Example Sound Files

#1
With the popularity of the methods described on the AJATT website and so forth, a lot of people are collecting sentences to study from. There's also various ways to obtain sentences (from wwwjdic, kodonline etc.) However, there's always the worry of where the sentence came from, who wrote it, what context it's appropriate in etc. There's also the problem that when you try and say the sentence, you'll get the intonation all wrong.

So why not collect transcribed sound files instead? It would be far easier to tell if the speaker is native Japanese or not, it would naturally focus more heavily on spoken Japanese rather than written Japanese, you could learn the pronunciation and intonation at the same time...

The bad point seems to be the space required to hold a large collection of sound files. However the biggest strength seems to be that foreigners could contribute with confidence. Get your favourite TV program or film or radio program and use Audacity or something to sample ten sound clips. Everyone does that and we start to build a sizeable collection...

I'm thinking about proposing it to the folks at wwwjdic. The Tanaka corpus has improved a lot but you'll never know if the sentence was originally Japanese or originally English. With contributed sound files, there's no doubt.

What do other people think?
Reply
#2
One of the guys who is part of the KO2001 project is currently putting together sound files for all the sentences, read by natives......
Reply
#3
No offense to anyone working on a sound file project, but I doubt it'll be of much use. Non-professional speakers will not read naturally. Try reading a list of words off a piece of paper and then playing back a recording of it, it's not natural. Combine that with regional dialect variations. It's also unlikely that it'll ever cover a significant amount of words. Finally, word pronunciation also differs depending on if a word follows it etc.

What I recommend, if you aren't afraid of spending a little money, is the official NHK accent dictionary. http://www.amazon.co.jp/NHK日本語発音アクセント辞典-...42&sr=11-1

Amazon seems to not have it in instock at the moment, but maybe you can find it somewhere else by its ISBN. It's not that expensive if I recall. It has sound files for 70,000+ words in the the standard NHK pronunciation, which is basically the standard for Japanese pronunciation. It has multiple sound files for each word, one by itself and one with the particle を following it, which has effects on the pitch. It can be converted to EPWING format to use on a PPC or in a multidictionary app like Kotonoko. It also displays a pitch diagram for the proper pitch accent of a word. The only downside I suppose is that it is only a male's voice.
Edited: 2008-05-20, 7:16 am
Reply
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions! - Sign up here
JapanesePod101
#4
Well, just to clarify, the KO2001 project is whole sentences, not single words.
Reply
#5
@Jarvik7

I actually tried looking for that CD set a while back, and I think it's out of print, because I can't find it anywhere. If anyone finds a copy or two, let me know.
Reply
#6
Anyone have any luck with finding the "NHK accent dictionary". I thought I might have stumbled upon a smaller modified online version of it at:

http://www.saiga-jp.com/kanji_dictionary.html

However I am probably wrong as Jarvik7 says it is made up entirely of male speech but that version appears to be female.
Reply
#7
wrightak Wrote:So why not collect transcribed sound files instead?

What do other people think?
 That's a great idea!

I don't know if this is what you mean, but check these podcasts:
情熱大陸+p
Japanese Listening (Advanced)
Nippon Voiceblog

They are all in Japanese by Japanese speakers (JLA has a non native speaker too) and they all have full transcripts.

 For more podcasts you can also check:
Voiceblog
Podcast Rank
Castella


I would also love a collection of Japanese transcriptions for dorama, movies, animation, musical videos... and even Japanese subtitles for occidental movies would be great!
Edited: 2008-06-10, 7:45 pm
Reply
#8
Transtic Wrote:I would also love a collection of Japanese transcriptions for dorama, movies, animation, musical videos... and even Japanese subtitles for occidental movies would be great!
I'm told that http://www.dramanote.com/ is a good source for J-drama scripts.
Reply
#9
I'm doing this at the moment. I'm going through マイボスマイヒーロー(http://www.dramanote.com/?eid=487425#sequel) cutting up the sound files (using Audacity) and linking them to the sentences in drama note (which seem word for word correct so far - if I'm in doubt I just skip it)

I think after this episode I'll probably move to Gokusen as the Japanese used there seems much easier to understand. Though the reason I chose My Boss My Hero was that it has a male lead and that's the type of speech I want to reproduce.

I would happily make use of a list of mp3 snippets with a japanese transcription (and possibly an English meaning too). Though I think it's lot of work and hard to organize! Big Grin
Reply
#10
There are also some Japanese subtitles for dramas here: http://www.d-addicts.com/forum/subtitles.php#Japanese
Edited: 2008-06-11, 2:34 pm
Reply
#11
I couldn't get audacity to rip audio straight from a video file, first I had to remove the audio from the video. Then use that audio in audacity. It's not too painful (I describe my process here http://howtojapan.blogspot.com/2008/05/r...o-and.html)
Reply
#12
Balaam Wrote:I'm doing this at the moment. I'm going through マイボスマイヒーロー(http://www.dramanote.com/?eid=487425#sequel) cutting up the sound files (using Audacity) and linking them to the sentences in drama note (which seem word for word correct so far - if I'm in doubt I just skip it)
This sounds great. Would you be willing to share these sound-file and transcript pairs?
Reply
#13
I've recorded my co-workers reading the sample sentences out of my 2kyu grammar text and refined them as clips that correspond to flashcards in mnemosyne http://www.mnemosyne-proj.org/ it has been an awful lot of work to make the cards, but in a very short time, I have memorized them. I catch them in the office, while watching the news ect.

The sound clips aren't perfect, they aren't voice actors, sometimes you can hear the phone ring or other people talking, but it is a very effective method for getting the grammar to stick in your head. The best part has been that mnemosyne decides the intervals between your cards, so you don't waste time reviewing cards you've mastered (something important when dealing with over 450 grammar examples).

I save one big long list of all the cards as a html file and load it into firefox and rikaichan anytime i come across a word I cant catch.

i posted some older versions (since deleted) of the cards in another thread using rapidshare, but since no one else has posted any flashcards, i'm reluctant to share the rest of them. However, if people who cut up some sentences from J-dramas, movies, ect. want to share some things, id be more than willing to share my hours and hours of work.

(now there are some copyright issues, but honestly- the books are still valuable because of the exercises that correspond to them-- and I really don't care)


the text is called 日本語総まとめ問題集2級 and is divided into 8 weeks - the focus of the flash cards is to be able to hear and understand the grammar, thats why they are ordered as they are.
http://rapidshare.com/files/121928086/week_8.zip.html

i honestly hope that we can start something where sound clip driven flash cards are shared, because its the quickest way to fluency. My Japanese has skyrocketed since I started using this method.
Edited: 2008-06-12, 8:51 am
Reply
#14
wrightak Wrote:
Balaam Wrote:I'm doing this at the moment. I'm going through マイボスマイヒーロー(http://www.dramanote.com/?eid=487425#sequel) cutting up the sound files (using Audacity) and linking them to the sentences in drama note (which seem word for word correct so far - if I'm in doubt I just skip it)
This sounds great. Would you be willing to share these sound-file and transcript pairs?
There's no easy way for me to get at them as I have been adding them to my deck pretty much at random. I'm also going through "Basic Japanese sentences patterns(a book)" and also anything interesting I find on the net at the same time. I could make my entire deck available it's about 1000 odd cards, if that sounds helpful (It's in Anki format)?
Reply
#15
Balaam Wrote:I could make my entire deck available it's about 1000 odd cards, if that sounds helpful (It's in Anki format)?
I would be interested.
Reply
#16
Does anybody know the easiest free way to rip audio from a DVD? Can you use virtualdubmod?
Reply
#17
zazen666 Wrote:One of the guys who is part of the KO2001 project is currently putting together sound files for all the sentences, read by natives......
Unless I'm missing something, I thought the only audio project for KO involved the use of a Japanese text-to-speech program. I don't remember hearing anything about native speakers, although that would be cool too.

However, I just tried out the sample deck and it really is awesome. It doesn't sound robotic at all and even with the occassional mistake, it still blows my mind. IMO, more people should be getting on the text-to-speech bandwagon.

Cutting sentences from movie audio is great because you are assured 100 percent accuracy, but it will never be as easy or flexible as a TTS engine.
Edited: 2008-06-12, 5:11 pm
Reply
#18
Balaam Wrote:I could make my entire deck available it's about 1000 odd cards, if that sounds helpful (It's in Anki format)?
Yep, that sounds great, thank you. In order to get at them, we'll need to add a tag to the cards that have sound files. We can then export all of the cards with that tag to a text file and that text file can then be manipulated in a spreadsheet to the user's taste and then imported into other decks. If you have a specific model that is used for these cards then a tag will likely be appended already. I'm happy to do all the work though so if you send your deck to me then that would be great. My email address is on this page: http://wrightak.googlepages.com/afterrtk12 Please zip up the deck with the sound files.
Reply
#19
duder Wrote:i honestly hope that we can start something where sound clip driven flash cards are shared, because its the quickest way to fluency. My Japanese has skyrocketed since I started using this method.
Me too. It's great to hear that it works in practice because I have yet to build a sizeable collection to see the effects. Creation of this sort of flash card takes time and collaboration can be of huge benefit I think. Thanks for posting the link, I'll check it out.
Reply
#20
A little off-topic but,

imagine how useful a near-flawless TTS system would be for learning.TTS has come a long way in the last decade(s?). In another 20, I wonder if it can get pretty darn close to a native speaker.

Given that long time gap, it doesn't do us learning right now much good, but it's still cool to think about. Interacting with others is what makes language great, but an almost flawless TTS would help so much!
Reply
#21
nest0r Wrote:I'm completely happy with using TTS for AJATT and then getting tonnes of unstructured (non-SRS) input from other sources (movies, dorama, music, et cetera), so instead of going through the trouble of mining audio, I just make nearly flawless audio myself for whatever kanji sentences I want.
I think this depends a lot on what you're trying to achieve. If you want to know what a written sentence might sound like when spoken then TTS may be a good option. If the pronunciation can fool natives, as you say, then listening to it might be a good exercise for improving your pronunciation.

However, the point of the exercise that I proposed in the original post is to use sound files as source material instead of (or in addition to) written sentences. As source material, it has to be Japanese that has been naturally spoken by a Japanese speaker. If you transcribe spoken Japanese and then use TTS to generate a sound file, you'll have different pronunciation from the original. This could be valuable to you but I think that having a sound file of the original would be much more valuable.

Listening to Japanese people read texts aloud is a nice thing to have but I am most enthusiastic about obtaining sound files of natural spoken Japanese (from quiz shows, radio chats etc.).
Reply
#22
@wrightak

I will try to start on this project today. I'm not really doing much this summer anyway, and I can easily see myself doing hundreds of these clips if its as easy as it sounds.

But before I start, does anybody know of websites that have a large amount of original Japanese movie/anime scripts or Japanese subtitle files? I know there is already a thread on this for indidual movies and shows, but shouldn't there a big website somewhere with a large database of such files? English subtitles are incredibly easy to find, but I have seen very few Japanese ones. I'm definitely not skilled enough to be able to transcribe anything more than simple Japanese sentences by myself.
Reply
#23
Dragg Wrote:I know there is already a thread on this for indidual movies and shows, but shouldn't there a big website somewhere with a large database of such files?
Great to hear. I think that if we can build a collection then it would be a brilliant resource to everyone.

With regards to scripts, I'm unaware of a large database but if you tell me what you want to watch/listen to then we can go about hunting for a transcription.

The other alternative is that you collect sound files you don't understand, post them to some communal area and people submit transcriptions. If we can get some kind of wiki framework going then that would be brilliant.
Reply
#24
Looks like its gonna be pretty tough to find transcripts. I just spent about an hour looking, and I couldn't find anything that I wanted.

I was looking for scripts or subs for: Berserk, Serial Experiments Lain, or Lupin the 3rd: Castle of Cagliostro. I even tried using the Japanese titles along with search terms like 脚本 and 字幕. If anybody has any other ideas let me know.

A Wiki-thing sounds good, but I wouldn't have any idea where to start as far as making one. Well, I will start the audio-cutting for now and see how that goes along first before worrying about the next step.
Edited: 2008-06-13, 7:52 pm
Reply
#25
wrightak Wrote:
duder Wrote:i honestly hope that we can start something where sound clip driven flash cards are shared, because its the quickest way to fluency. My Japanese has skyrocketed since I started using this method.
Me too. It's great to hear that it works in practice because I have yet to build a sizeable collection to see the effects. Creation of this sort of flash card takes time and collaboration can be of huge benefit I think. Thanks for posting the link, I'll check it out.
I don't have that many sound files as I've only recently started but what I do have is available on the the Anki wiki: http://ichi2.net/anki/wiki/ExtraDecks the 1000ish example sentences link.
Reply