![]() |
|
Rikaichan: RevTK Community Edition - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: Learning resources (http://forum.koohii.com/forum-9.html) +--- Thread: Rikaichan: RevTK Community Edition (/thread-5351.html) |
Rikaichan: RevTK Community Edition - cb4960 - 2012-11-13 sikieiki Wrote:Noticed something which I believe to be a bug.Rikaisama currently uses the unconjugated kanji form of the word (こき下ろす) which doesn't exist in sanseido. For a future release, I would like do something like this: 1) First try looking up the unconjugated kanji form. 2) If nothing is found, try looking up the unconjugated reading. 3) If nothing is found, fallback to the default EDICT dictionary. I might have time this weekend for a Rikaisama update. Maybe this will make it in. Rikaichan: RevTK Community Edition - cb4960 - 2012-11-13 Fangio Wrote:If there is a new version on the way, I would really support including support for pitch accent, as already suggested (there are still so few materials to work on pitch accents!).Can you explain how pitch accents are traditionally presented to the user and also point me to a resource that I can use to extract them? Rikaichan: RevTK Community Edition - cb4960 - 2012-11-17 I have just uploaded version 16.0 of the Rikaisama Firefox extension. Download version 16.0 via SourceForge What's New? ● Merged code with rikaichan 2.07 baseline. ● Implemented a fallback mechanism for Sanseido mode. (Thanks sikieiki!). 1) First the kanji form of the word is searched (example: こき下ろす) 2) If the kanji form is not found, search for the kana form (example: こきおろす) 3) If all else fails, display the default non-Sanseido definition. ● Fixed display of definition numbers and sub-definition numbers in Sanseido mode. Also, sub-definition numbers are now circled. cb4960 Rikaichan: RevTK Community Edition - sikieiki - 2012-11-17 Thank you for the update. The sanseido lookup seems to work as expected now. I noticed the voice lookup still has the old problem though. cb4960 Wrote:I have just uploaded version 16.0 of the Rikaisama Firefox extension. Rikaichan: RevTK Community Edition - cb4960 - 2012-11-18 sikieiki Wrote:Thank you for the update. The sanseido lookup seems to work as expected now. I noticed the voice lookup still has the old problem though.I'll see about fixing the voice lookup code for the next version. Thanks. Rikaichan: RevTK Community Edition - Fangio - 2012-11-30 cb4960 Wrote:Here's a lot of input about pitch accent and how it was implemented (in the case of spreadsheets): http://forum.koohii.com/showthread.php?tid=9437Fangio Wrote:If there is a new version on the way, I would really support including support for pitch accent, as already suggested (there are still so few materials to work on pitch accents!).Can you explain how pitch accents are traditionally presented to the user and also point me to a resource that I can use to extract them? Also an Anki plugin (unfortunately not yet translated to Anki 2.0): http://forum.koohii.com/showthread.php?pid=167578#pid167578 Both use Japanese-only online dictionaries, which use the common numbering system (no. of the "accented" mora, i.e. the mora with a falling pitch). A great (but probably somewhat complex) option (in addition rather than alternative) is the representation used here: http://accent.u-biq.org/english.html Rikaichan: RevTK Community Edition - cb4960 - 2012-12-01 Quote:Here's a lot of input about pitch accent and how it was implemented (in the case of spreadsheets): http://forum.koohii.com/showthread.php?tid=9437Thanks for the info. Pardon my original laziness as I could have just looked at the Wikipedia entry which contained everything that I needed to know about the problem domain. In a sense, Rikaisama already supports pitch accents by allowing you to use EPWING mode with the 大辞林 dictionary. To enable pitch accents for non-EPWING mode, I could create a database beforehand by running each word in EDICT through the 大辞林 EPWING dictionary and parsing out the pitch accent information. Should be simple enough. The method of display presented in the accent.u-biq.org site is interesting. Perhaps I could be a little less fancy and use underlined/bold/colored text for mora with high pitch. Rikaichan: RevTK Community Edition - toshiromiballza - 2012-12-01 cb4960 Wrote:Perhaps I could be a little less fancy and use underlined/bold/colored text for mora with high pitch.But how would words with two (or more?) possible accents be treated? Some are marked as 01, etc. Rikaichan: RevTK Community Edition - cb4960 - 2012-12-01 toshiromiballza Wrote:Very good point. Until somebody can think of something more clever, I'll just append 01 or [0][1] and forgo the fancy visualization.cb4960 Wrote:Perhaps I could be a little less fancy and use underlined/bold/colored text for mora with high pitch.But how would words with two (or more?) possible accents be treated? Some are marked as 01, etc. Rikaichan: RevTK Community Edition - cb4960 - 2012-12-01 Here is a human-readable version of the pitch accent database that is used by Rikaisama: Tabbed-Separated Values Format: Download japanese_pitch_accents_121216.txt via MediaFire Notes: The database contains 117,101 entries. The database was generated by dumping the full contents of the 『大辞林 第2版』, 『NHK日本語発音アクセント辞典』, and 『新明解国語辞典第五版』 EPWING dictionaries to a text file using EBライブラリー and then parsing out the pitch information for words used by Rikaisama's default dictionary (EDICT). Processing time takes a combined 15 seconds. Database Format: Column 1 = Expression -or- reading if word has no expression. Column 2 = Reading -or- blank if word has no expression. Column 3 = Pitch accent. Rikaichan: RevTK Community Edition - cb4960 - 2012-12-02 I have just uploaded version 17.0 of the Rikaisama Firefox extension. Download version 17.0 via SourceForge What's New? ● Added pitch accent information to the right of the reading if available. (Thanks Fangio!). To enable, check the "Options... -> Dictionaries Tab -> Show pitch accent" checkbox. Screenshots: ![]() ![]() ![]() Perhaps a future version will have a better visualization of the pitch accent such as underlining the mora with the high pitch (where possible). cb4960 Rikaichan: RevTK Community Edition - cb4960 - 2012-12-02 I have just uploaded version 17.1 of the Rikaisama Firefox extension. Download version 17.1 via SourceForge What's New? ● Added pitch accents for katakana words. cb4960 Rikaichan: RevTK Community Edition - HelenF - 2012-12-02 Thanks for the pitch database. People have been trying to make that for ages using website-scraping tools. Looks like you've found a much more effective solution. Rikaichan: RevTK Community Edition - cb4960 - 2012-12-02 HelenF Wrote:Thanks for the pitch database. People have been trying to make that for ages using website-scraping tools. Looks like you've found a much more effective solution.You're welcome. Just keep in mind that it is a minimal database designed specifically to be used with the words that Rikaisama can recognize. 大辞林 第2版 has pitch accent information for many more words than exist in the database. Rikaichan: RevTK Community Edition - Sebastian - 2012-12-02 Adding accent data to Rikaisama sounds great! cb4960 Wrote:You could just write the reading several times to cover the different accents.toshiromiballza Wrote:Very good point. Until somebody can think of something more clever, I'll just append 01 or [0][1] and forgo the fancy visualization.cb4960 Wrote:Perhaps I could be a little less fancy and use underlined/bold/colored text for mora with high pitch.But how would words with two (or more?) possible accents be treated? Some are marked as 01, etc. For example: Quote:飛車Where \ and  ̄ represent the underlining and "overline" that would signal the accent. Rikaichan: RevTK Community Edition - toshiromiballza - 2012-12-03 Sebastian Wrote:You could just write the reading several times to cover the different accents.I think that's a bad idea. It would create too much unnecessary clutter. Rikaichan: RevTK Community Edition - Fangio - 2012-12-03 cb4960 Wrote:I have just uploaded version 17.0 of the Rikaisama Firefox extension.I should be the one thanking you, this is great! Thanks a million! Rikaichan: RevTK Community Edition - sikieiki - 2012-12-03 Would it be possible to get a key to copy the highlighted word? You can now use C to copy, but it copies the definition as well. Can you make alt + C copy the highlighted word/phrase only? Sometimes there is no definition and you want to search manually. Of course, you can do this with right clicking and using "Search google for XXX" to get the same effect, but it might be useful to get just the word in the clipboard by itself. Additionally, I was wondering if there is any plans to support firefox on android? It looks like android on firefox supports addons but I dont know how much of a change it would be to make it work. Rikaichan: RevTK Community Edition - toshiromiballza - 2012-12-03 How is it that some words are missing the accent? Are they not in Daijirin? 伸ばす or 並べる for example. Rikaichan: RevTK Community Edition - cb4960 - 2012-12-03 toshiromiballza Wrote:Agreed.Sebastian Wrote:You could just write the reading several times to cover the different accents.I think that's a bad idea. It would create too much unnecessary clutter. Rikaichan: RevTK Community Edition - cb4960 - 2012-12-03 sikieiki Wrote:Would it be possible to get a key to copy the highlighted word? You can now use C to copy, but it copies the definition as well. Can you make alt + C copy the highlighted word/phrase only? Sometimes there is no definition and you want to search manually. Of course, you can do this with right clicking and using "Search google for XXX" to get the same effect, but it might be useful to get just the word in the clipboard by itself.You can press Ctrl-C to copy just the highlighted word/phrase. sikieiki Wrote:Additionally, I was wondering if there is any plans to support firefox on android? It looks like android on firefox supports addons but I dont know how much of a change it would be to make it work.No plans. Rikaichan: RevTK Community Edition - cb4960 - 2012-12-03 toshiromiballza Wrote:How is it that some words are missing the accent? Are they not in Daijirin?Good catch. Those words are in Daijirin and have pitch accents. The problem seems to either be a bug in the library that I'm using to perform EPWING lookups or with the way I'm using the library. I'll need to investigate. Hopefully this is fixable. Edit: I believe that I have found the bug in EBライブラリ. It appears that for Daijirin, it is incorrectly converting the voiced consonants of the JIS 0208 form of the word to their non-voiced equivalents (ie. ば -> は). Bypassing that particular conversion routine (eb_convert_voiced_consonants_jis) seems to fix the problem for my test words (伸ばす, 並べる, 黄ばむ). More testing is needed though. Rikaichan: RevTK Community Edition - toshiromiballza - 2012-12-05 cb4960 Wrote:I believe that I have found the bug in EBライブラリ. It appears that for Daijirin, it is incorrectly converting the voiced consonants of the JIS 0208 form of the word to their non-voiced equivalents (ie. ば -> は). Bypassing that particular conversion routine (eb_convert_voiced_consonants_jis) seems to fix the problem for my test words (伸ばす, 並べる, 黄ばむ). More testing is needed though.What about 表れる, 賜る or 行う. No voiced consonants. Also, what are the accents marked with a hyphen between them? [1]-[1], [0]-[0], [1]-[0], [1][1]-[0], etc. Rikaichan: RevTK Community Edition - cb4960 - 2012-12-08 toshiromiballza Wrote:What about 表れる, 賜る or 行う. No voiced consonants.Pitch accents for these words are in v17.2 which I'm about to post. If you find any more, please don't hesitate. toshiromiballza Wrote:Also, what are the accents marked with a hyphen between them? [1]-[1], [0]-[0], [1]-[0], [1][1]-[0], etc.I'm not sure. Maybe some smart person can enlighten us. Rikaichan: RevTK Community Edition - cb4960 - 2012-12-08 I have just uploaded version 17.2 of the Rikaisama Firefox extension. Download version 17.2 via SourceForge What's New? ● Added pitch accents for 10,751 more words. This brings the total to 107,309. ● For many words, added which part-of-speech a pitch accent applies to. ● Added the "Hide pitch accent part-of-speech unless ',' or '|' is present" option. It is enabled by default. Read the next section to learn what ',' and '|' represent. ● Changes to format of pitch accents. The new pitch accent format: <blank> - Example: 単眼鏡 たんがんきょう No pitch accent information available for this word. 0 – Example: 洗う あらう 0 Zero means no accent. From Wikipedia: "Word doesn't have an accent, the pitch rises from a low starting point on the first mora or two, and then levels out in the middle of the speaker's range, without ever reaching the high tone of an accented mora. Japanese describe the sound as "flat" (平板 heiban) or "accentless". " 2 – Example: 願う ねがう 2 The "2" indicates that the accent is on the 2nd mora (the が). 32 – Example: 著作権 ちょさくけん 32 The "32" indicates that the accent can be on either the 3rd mora (く) or 2nd mora (さ). This is in frequency order, meaning that it is more common for the accent to be on the 3rd mora than the 2nd mora. {11} – Example: 超越論的観念論 ちょうえつろんてきかんねんろん {11} Curly braces are placed around pitch accents that are in the double digits. The "11" indicates that the accent is on the 11th mora. 21,0 – Example: 飛車 しゃ 21,0 For some words, Daijirin contains multiple sub-definitions in an entry. Sometimes each sub-definition can have a different pitch. A comma separates the pitch accents for the multiple sub-definitions. The "21,0" means that in the 1st sub-definition of the word, the accent is on either the 2nd mora (しゃ) or 1st mora (ひ), and that in the 2nd sub-definition of the word, no accent is present. 1-2 – Example: 思案投げ首 しあんなげくび 1-2 I'm not sure what the "-" is supposed to represent. It is present in Diajirin so I left it in. 1|Ø – Example: 朝日 あさひ 1|Ø For some words, Daijirin contains multiple entries that have identical expressions and readings. The "|" separates the pitch found in each entry. The "1" indicates that in the first Daijirin entry, the pitch accent was on the first mora. The "Ø" symbol indicates that the other Daijirin entry contained no pitch accent information. (part-of-speech) – Example: 点々 てんてん (名)03,(副)0,(形動タリ)0 Sometimes pitch accent changes depending on the word's part-of-speech. The part-of-speech is placed inside of parenthesis. The above example shows that the pitch accent is "03" when the word is used as a noun, "0" when the word is used as an adverb, and "0" when used as a noun adjective (specifically a "classical form of na-adjective inflection formed by contraction of the particle "to" with the classical verb "ari" ("aru") "). Valid part-of-speech options: (名) 名詞 (代) 代名詞 (動五) 動詞五段活用 (動五[四])動詞口語五段活用 ・文語四段活用 (動四) 動詞四段活用 (動上一) 動詞上一段活用 (動上二) 動詞上二段活用 (動下一) 動詞下一段活用 (動下二) 動詞下二段活用 (動カ変) 動詞カ行変格活用 (動サ変) 動詞サ行変格活用 (動ナ変) 動詞ナ行変格活用 (動ラ変) 動詞ラ行変格活用 (動特活) 動詞特別活用 (形) 形容詞 (形ク) 形容詞ク活用 (形シク) 形容詞シク活用 (形動) 形容動詞 (形動ナリ)形容動詞ナリ活用 (形動タリ)形容動詞タリ活用 (ト|タル) 「~と」(副)「~たる」(連体詞)の形で用いられるもの (連体) 連体詞 (副) 副詞 (接続) 接続詞 (感) 感動詞 (助動) 助動詞 (格助) 格助詞 (接助) 接続助詞 (副助) 副助詞 (係助) 係助詞 (終助) 終助詞 (間投助) 間投助詞 (並立助) 並立助詞 (準体助) 準体助詞 (接頭) 接頭語 (接尾) 接尾語 (連語) 連語 (枕詞) 枕詞 Random examples: 私年号 しねんごう 2 ドライバー 20,0 我がまま わがまま (名・形動)34 角 かく (名・形動)21,12,20 去る さる (動ラ五四)1|(連体)1 現金自動支払機 げんきんじどうしはらいき {10}3-6 十重二十重 とえはたえ 11-2 cb4960 |