Back

Pitch Accent Database/Resource in Spreadsheet format?

#1
I was wondering if anyone knows of a pitch accent database that is in some format where all the words are listed, along with their pitch accent.
I wanted to create a database of words with their pitch accent, in order to use them for a simple Anki addon.

There's currently a pitch accent addon for Anki ( https://ankiweb.net/shared/info/932119536 ) but I don't like the format of it. I prefer the pitch accent numbers. There are also a lot of missing words using this addon.

I found this database here: https://raw.githubusercontent.com/javdej...nicode.csv but I can't seem to figure out what actually contains the pitch accent, so if someone could decrypt this, that'd be cool.

Thanks Big Grin
Reply
#2
From https://github.com/javdejong/nhk-pronunc...ciation.py, line 49, the name of the csv columns can be inferred (the things between the squared brackets, i.e. from "NID" to "ac"):
Code:
AccentEntry = namedtuple('AccentEntry', ['NID','ID','WAVname','K_FLD','ACT','midashigo','nhk','kanjiexpr','NHKexpr','numberchars','nopronouncepos','nasalsoundpos','majiri','kaisi','KWAV','midashigo1','akusentosuu','bunshou','ac'])

As for what they mean, not sure yet about several fields. I'll leave here a couple of sample rows for reference, so others can chime in as well:
Code:
1,1,J00001.wav,0,100010010,アー,ああ,嗚呼,ああ{嗚呼},2,,,アー,0,,アー,1,0,20
2,1,J00001.wav,0,100010010,アア,ああ,嗚呼,ああ{嗚呼},2,,,アー,0,,アー,1,0,20
3,2,J00002.wav,0,100020010,アー,ああ,ああ,ああ(~言う),2,,,アー,0,,アー,1,0,1
4,2,J00002.wav,0,100020010,アア,ああ,ああ,ああ(~言う),2,,,アー,0,,アー,1,0,1
5,3,J00003.wav,1,100030010,アークトー,アーク灯,アーク灯,アーク灯,5,3,,アークトーオ,0,K00003.wav,アークトー,1,0,1111
6,3,J00003.wav,1,100030010,アークトウ,アーク灯,アーク灯,アーク灯,5,3,,アークトーオ,0,K00003.wav,アークトー,1,0,1111
7,4,J00004.wav,1,100040010,アーケード,アーケード,アーケード,アーケード,5,,,アーケードオ,0,K00004.wav,アーケード,2,0,20000
8,5,J00005.wav,1,100040010,アーケード,アーケード,アーケード,アーケード,5,,,アーケードオ,0,K00005.wav,アーケード,2,0,1200
9,6,J00006.wav,1,100050010,アース,アース,アース,アース,3,,,アースオ,0,K00006.wav,アース,1,0,200
10,7,J00007.wav,1,100060010,アーチ,アーチ,アーチ,アーチ,3,,,アーチオ,0,K00007.wav,アーチ,1,0,200
11,8,J00008.wav,1,100070010,アーチェリー,アーチェリー,アーチェリー,アーチェリー,6,,,アーチェリーオ,0,K00008.wav,アーチェリー,1,0,200000
12,9,J00009.wav,1,100080010,アーチスト,アーチスト,アーチスト,アーチスト,5,3,,アーチストオ,0,K00009.wav,アーチスト,1,0,20000
13,10,J00010.wav,1,100090010,アート,アート,アート,アート,3,,,アートオ,0,K00010.wav,アート,1,0,200
14,11,J00011.wav,1,100100010,アーバンライフ,アーバンライフ,アーバンライフ,アーバンライフ,7,,,アーバンライフオ,0,K00011.wav,アーバンライフ,1,0,111200
15,12,J00012.wav,1,100110010,アーム,アーム,アーム,アーム,3,,,アームオ,0,K00012.wav,アーム,2,0,200
16,13,J00013.wav,1,100110010,アーム,アーム,アーム,アーム,3,,,アームオ,0,K00013.wav,アーム,2,0,11
17,14,J00014.wav,1,100120010,アーメン,アーメン,アーメン,アーメン,4,,,アーメンオ,0,K00014.wav,アーメン,2,0,111
18,15,J00015.wav,1,100120010,アーメン,アーメン,アーメン,アーメン,4,,,アーメンオ,0,K00015.wav,アーメン,2,0,2000
19,16,J00016.wav,1,100130010,アーモンド,アーモンド,アーモンド,アーモンド,5,,,アーモンドオ,0,K00016.wav,アーモンド,2,0,20000
20,17,J00017.wav,1,100130010,アーモンド,アーモンド,アーモンド,アーモンド,5,,,アーモンドオ,0,K00017.wav,アーモンド,2,0,1200
Reply
#3
read this https://github.com/javdejong/nhk-pronunc...ciation.py
Code:
# "Class" declaration
AccentEntry = namedtuple('AccentEntry', ['NID','ID','WAVname','K_FLD','ACT','midashigo','nhk','kanjiexpr','NHKexpr','numberchars','nopronouncepos','nasalsoundpos','majiri','kaisi','KWAV','midashigo1','akusentosuu','bunshou','ac'])
Edited: 2017-12-20, 8:39 pm
Reply
Breakthrough Sale! Get 28% OFF Basic, Premium or Premium PLUS! (until March 2)
JapanesePod101
#4
Kanjium has pitch accent data (as previously discussed here).
Reply
#5
(2017-12-20, 11:10 pm)fkb9g Wrote: Kanjium has pitch accent data (as previously discussed here).

Thanks!
I found exactly what I was looking for here: https://github.com/mifunetoshiro/kanjium..._files/raw

accents.txt
Reply
#6
Episode 7 of Dōgen's Japanese Phonetics course shows how to look up pitch accent information in スーパー大辞林, which is the ja–ja dictionary included with macOS.

I don't have time to look up words one at a time, so I add this information to my flash cards using this dump with 190,881 entries.

PNG diagrams and Anki-ready HTML are provided for words with six or fewer morae.
Reply