That's 奉げる (ささげる, usually spelled 捧げる) , not 挙げる.
2015-06-14, 12:40 pm
2015-06-14, 12:47 pm
jcdietz03 Wrote:Everything in this post is using the 14Jun15 0300 version
jcdietz03 Wrote:I don't like the auto-furigana feature. So I made a list with the dictionary feature and picked the ones I wanted to keep.If 0300 is the time of my post, then you use an outdated version. pastebin.com/XXTi8yyn is the XX45 latest version.
The old versions didn't supported auto-furigana in many situations, I spent hours to find a test data and fix this.
If it's not the case, then tell me exactly, why you don't like the auto-furigana.
In the latest version the furigana is
そして彼から供物を捧げる<儀式>を執り行うっために
そして彼#2Rかれ#から供物#4Rそなえもの#を捧#2Rささ#げる<儀式#4Rぎしき#>を執り行#6Rとりおこな#うっために
魔導杖 is in the test cases, 510th and 512th line.
The script's output and this website's output is the same, in most cases
nihongo.j-talk.com
jcdietz03 Wrote:But your program does not like it if the furigana is "" so I made it one fullwidth space instead and that seemed to work.You can not give an empty string as the second parameter. :#
If you don't need the auto-furigana, you can disable it
Code:
DISABLE_AUTO_FURIGANA = TrueI haven't tested this code:
Code:
mecab = MecabController()
with codecs.open('ZnK Dictionary.txt','r','utf-8') as f:
for line in f:
line = line.rstrip('\n').split(',')
if ( len(line) > 1 and len(line[1]) > 0 ):
mecab.remember(line[0],line[1])
#mecab.remember( u"奉げ", u"ささげ" )
print str( len( mecab.translation ) ) + " keys"
outfid = codecs.open(sys.argv[2], "w", "utf8")
with codecs.open(sys.argv[1], "r", "utf8") as fid:
for s in fid:
outfid.write(mecab.opcode_restore(s))
outfid.close()
Edited: 2015-06-14, 1:21 pm
2015-06-14, 1:29 pm
I'd rather have manual control. It's because of entries such as:
この街,クロスベル
この街 is probably Crossbell most of the time, but it's not going to be Crossbell all the time, I don't think. So I want to go with the original furigana where specified and with MeCab furigana where it's not.
一課,ウ チ
Lots of people talk about "Section One."
The character who says this is from Section One. But, throughout the game, those not from Section One also talk about Section One. It wouldn't be appropriate for those not from Section One to say ウチ when they talk about it.
二課,ウ チ - See above
二十,ハタチ
That's fine if it's 20 years old, but it it's just twenty it should be にじゅう.
仔猫,キティ
I'd just rather the standard furigana for this really.
A lot of them fit into this category.
先方,ルバーチェ
"The other party, Revache." But throughout the game, 先方 could be used to refer to various other parties depending on the situation.
There's another category where MeCab will produce the right answer anyway.
仮初,かりそめ
供物,くもつ
etc...
I got 212 keys when I used your dictionary tool and I kept the ones where I thought the standard reading was just plain wrong, usually because it's a made-up word (Orbal Staff,魔導杖,オーバルスタッフ), or confusing (bracer, 遊撃士,ブレイサー).
I like behavior where it will:
1) use the pre-defined reading already present if present
2) use the furigana dictionary
3) use MeCab
in that order
The pre-defined reading isn't always appropriate for all situations as explained above. The ones I have in the dictionary are probably fine for all situations.
この街,クロスベル
この街 is probably Crossbell most of the time, but it's not going to be Crossbell all the time, I don't think. So I want to go with the original furigana where specified and with MeCab furigana where it's not.
一課,ウ チ
Lots of people talk about "Section One."
The character who says this is from Section One. But, throughout the game, those not from Section One also talk about Section One. It wouldn't be appropriate for those not from Section One to say ウチ when they talk about it.
二課,ウ チ - See above
二十,ハタチ
That's fine if it's 20 years old, but it it's just twenty it should be にじゅう.
仔猫,キティ
I'd just rather the standard furigana for this really.
A lot of them fit into this category.
先方,ルバーチェ
"The other party, Revache." But throughout the game, 先方 could be used to refer to various other parties depending on the situation.
There's another category where MeCab will produce the right answer anyway.
仮初,かりそめ
供物,くもつ
etc...
I got 212 keys when I used your dictionary tool and I kept the ones where I thought the standard reading was just plain wrong, usually because it's a made-up word (Orbal Staff,魔導杖,オーバルスタッフ), or confusing (bracer, 遊撃士,ブレイサー).
I like behavior where it will:
1) use the pre-defined reading already present if present
2) use the furigana dictionary
3) use MeCab
in that order
The pre-defined reading isn't always appropriate for all situations as explained above. The ones I have in the dictionary are probably fine for all situations.
Advertising (Register to hide)
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions!
- Sign up here
2015-06-21, 4:01 am
jcdietz03 Wrote:魔導杖,,,50000,名詞,一般,*,*,*,*,魔導杖,オーバルスタッフ,オーバルスタッフ50000 is too big: the higher the number, the *less* likely it'll be preserved. Try making this 100, or even lower. Most words in a couple of IPADIC files I looked at were between 2000 and 10000. So 100 should be low enough to make MeCab treat 魔導杖 as one morpheme
