I wanted to start studying intonation/accent, so I added accent markings to all of Core10000. It took me like 15+ hours, so I hope someone else can benefit from my work.
https://docs.google.com/spreadsheet/ccc?...3FtbXU3WXc
There are about 9600 cards in this deck, and I got about 9000 of the accent markings using my program DictScrape. I manually had to add about 600 accent markings.
Just when I thought I was finished, I realized that my program might not have been working correctly when there was no kanji in the "Target Kanji" field, so I had to go back and double check about another 1000 cards that only have kana (no kanji).
Are there many mistakes?
I don't think there are that many. I checked about 1500 cards by hand, so I could have made a mistake when updating those cards. For the other cards, I think the program did a pretty good job.
There may theoretically be a mistake if there are two separate words listed in the dictionary that have the same Kanji + Reading (kana) but have different accents. In my experience, I don't think this is very common (in fact, I don't think it happens at all in the Daijirin).
What should I do if I find a mistake?
Post it here! I will love you for it.
Where did you get these accents?
DictScrape queries Yahoo's Daijirin dictionary and parses out the accent for an entry. When going through and checking the cards by hand, I used a combination of NHK 日本語発音アクセント辞典, 新明解国語辞典 第五版, and 三省堂 スーパー大辞林.
Some words have multiple accents listed, which one did you use?
For the cards processed automatically, all available accents are listed. For instance, if the accent for a card is 01, then the 0 accent is more common, and the 1 accent is less common (at least, I think this is the way Daijirin does it).
There are some words that have different accents depending how the word is used. For instance, when the word is used as a noun, it has a 0 and when it is used as an adverb, it has a 1. I don't really make a distinction in my program. The accents are just liked 01 as normal. Sorry :-(.
For the cards that I checked/updated by hand, I used the accent that was in common between the three dictionaries I used. For instance, sometimes Daijirin would list a word as being 01, but the NHK dictionary would only list it as 0. In that case, I would just use 0.
What are your Anki templates going to look like?
I plan on doing accent production two ways. One way will be just looking at the Kanji (and maybe kana?). The other way will just be hearing the word (I imagine this will get too easy after a while...?). I also plan on listening to the whole sentence and trying to repeat it, while really concentrating on the accent. This will be hard to grade myself on.
Where can I get the audio?
Look for a file online called Core10Kv4.7z.
Doesn't the accent for some words change depending on where the word is in the sentence?
Yes, I believe that is true. You should ask AlexanderC about it ;-p (I'm looking forward to that 3rd youtube video!)
https://docs.google.com/spreadsheet/ccc?...3FtbXU3WXc
There are about 9600 cards in this deck, and I got about 9000 of the accent markings using my program DictScrape. I manually had to add about 600 accent markings.
Just when I thought I was finished, I realized that my program might not have been working correctly when there was no kanji in the "Target Kanji" field, so I had to go back and double check about another 1000 cards that only have kana (no kanji).
Are there many mistakes?
I don't think there are that many. I checked about 1500 cards by hand, so I could have made a mistake when updating those cards. For the other cards, I think the program did a pretty good job.
There may theoretically be a mistake if there are two separate words listed in the dictionary that have the same Kanji + Reading (kana) but have different accents. In my experience, I don't think this is very common (in fact, I don't think it happens at all in the Daijirin).
What should I do if I find a mistake?
Post it here! I will love you for it.
Where did you get these accents?
DictScrape queries Yahoo's Daijirin dictionary and parses out the accent for an entry. When going through and checking the cards by hand, I used a combination of NHK 日本語発音アクセント辞典, 新明解国語辞典 第五版, and 三省堂 スーパー大辞林.
Some words have multiple accents listed, which one did you use?
For the cards processed automatically, all available accents are listed. For instance, if the accent for a card is 01, then the 0 accent is more common, and the 1 accent is less common (at least, I think this is the way Daijirin does it).
There are some words that have different accents depending how the word is used. For instance, when the word is used as a noun, it has a 0 and when it is used as an adverb, it has a 1. I don't really make a distinction in my program. The accents are just liked 01 as normal. Sorry :-(.
For the cards that I checked/updated by hand, I used the accent that was in common between the three dictionaries I used. For instance, sometimes Daijirin would list a word as being 01, but the NHK dictionary would only list it as 0. In that case, I would just use 0.
What are your Anki templates going to look like?
I plan on doing accent production two ways. One way will be just looking at the Kanji (and maybe kana?). The other way will just be hearing the word (I imagine this will get too easy after a while...?). I also plan on listening to the whole sentence and trying to repeat it, while really concentrating on the accent. This will be hard to grade myself on.
Where can I get the audio?
Look for a file online called Core10Kv4.7z.
Doesn't the accent for some words change depending on where the word is in the sentence?
Yes, I believe that is true. You should ask AlexanderC about it ;-p (I'm looking forward to that 3rd youtube video!)
Edited: 2012-12-08, 1:30 am
