Back

JGlossator - Interactive Tool for Creating Glosses from Japanese Text

#1
Hello,

You may use JGlossator to create a gloss for Japanese text complete with
de-inflected expressions, readings, audio pronunciation, example sentences,
pitch accent, word frequency, kanji information, and grammar analysis. Screenshot:

[Image: main.png]

JGlossator will automatically gloss any Japanese text that you copy to the clipboard. Setting aside more obvious usage, this makes it ideal for use with Capture2Text when reading manga or with AGTH/ITH when either playing visual novels or watching video with Japanese subtitles.

By right-clicking on a glossed entry you will be presented with a menu that allows you to view alternate entries, save the current entry to file, or listen to an audio pronunciation.

JGlossator is highly configurable and allows you to modify many of the default behaviors and settings. For example, you can turn off the clipboard monitor, change themes, specify a new save format, remove pitch accent, etc. Just press the options button on the far right.

You can also perform a gloss using your favorite EPWING dictionaries. Just add them to the Dictionary Setup tab of the Options dialog.

Type in an English word to search definitions instead. The resulting list will be sorted based on frequency. To search whole words only, add "w/" in front of the word. To use a regular expression, type "r/" followed by a regular expression.

Kanji search is supported as well. Just use one of these formats:
Search Based On Format
Meanings km/<comma-separated list of meanings>
RTK Primitives kp/<comma-separated list of RTK primitives>
Radical meanings kr/<comma-separated list of radical meanings>
ON readings ko/<comma-separated list of ON readings>
KUN readings kk/<comma-separated list of KUN readings>

Know basic HTML/CSS? Want to change a font, color, or maybe even the format of the kanji gloss? No problem, just create a new theme in the themes directory or modify an existing one.

Some useful shortcut keys:
ESC Place cursor in the input box
Up (when the input box has focus) Clear text in the input box
Backspace (when the input box doesn't have focus) Go back through the history
Ctrl-Up Go back through the history
Ctrl-Down Go forward through the history

Download the latest version via SourceForge (source code is available as well)

You will need Windows (XP/Vista/7/8) and .Net Framework v3.5 installed.

Please visit the JGlossator homepage for more information and screenshots.

Have Fun!
cb4960
Edited: 2015-09-07, 10:30 pm
Reply
#2
Tips:

1) You can place the audio clips located in this thread into the audio directory to prevent the delay that occurs when JGlossator downloads them on demand.

2) If you don't want a particular word to appear in the gloss, just place its dictionary form in blacklist.txt.
Edited: 2013-01-14, 10:45 pm
Reply
#3
Looks nice.
Reply
JapanesePod101
#4
Looks really great... Thank you!
Reply
#5
Lord and Savior CB4960 is at it again.

Looks useful to use with TA when reading visual novels.
Edited: 2013-01-13, 8:55 pm
Reply
#6
Thank you. I have it setup to gloss dorama subs on the fly using AGTH. Now i can just pause the video when I have trouble understanding the dialogue.

Here is a simple bat file I made:

start /d "D:\JGlossator_v1.0" JGlossator.exe
REM First opens JGlossator
start /d "D:\AGTH" agth.exe /c150 "F:\MPC-HC\mpc-hc.exe"
REM Then it opens AGTH and tells it to listen to mpc-hc
REM The '/c150' part tells agth to automatically send the text to the clipboard every 150ms
REM Since JGlossator watches the clipboard, it automatically glosses once the video begins

Once the subtitles begin, you'll need to select the proper channel in JGlossator's pull down box.

EDIT: Would it be possible to have the program pull the audio files from the JDIC zip file like anki 1 is able to do with the plugin you made?
Edited: 2013-01-14, 7:47 am
Reply
#7
This is excellent! Thank you very much indeed!
Reply
#8
Just wondering.
What language(s) did you write this in and what other tools did you use?

Thanks.
Reply
#9
Thanks for the replies.

@Oniichan,

Something like that can probably be arranged. For now the workaround is to unzip all of those audio clips into the audio directory.

@chamcham,

As with most of my programs, I used C#.

Other tools used (from the readme):

EB Library
http://ftp.ftp.sra.co.jp/pub/misc/eb/
Library for EPWING dictionary lookups.

Mecab (by Taku Kudo and Nippon Telegraph and Telephone Corporation)
http://code.google.com/p/mecab/
Morphological Analyzer for Japanese Text.

CaboCha (by Taku Kudo)
http://code.google.com/p/cabocha/
Japanese Dependency Structure Analyzer.

jCorrect (by Hiroyuki Ohsaki)
http://www.ispl.jp/~oosaki/research/tips-jcorrect/
Used to generated grammar tips for technical writing.

Obi-2 (by Satoshi Sato)
http://kotoba.nuee.nagoya-u.ac.jp/sc/obi2/obi_e.html
Readability Analyzer of Japanese Texts.

EDICT (by Jim Breen)
http://www.csse.monash.edu.au/~jwb/edict.html
Free Japanese-English dictionary.

ENAMDICT (by Jim Breen)
http://www.csse.monash.edu.au/~jwb/enamdict_doc.html
Free Japanese names dictionary.

KANJIDIC
http://www.csse.monash.edu.au/~jwb/kanjidic.html
Free kanji dictionary.

Tatoeba
http://tatoeba.org/eng/home
Free example sentence corpus.

System.Data.SQLite
http://system.data.sqlite.org/index.html...index.wiki
ADO.NET adapter for SQLite.

Rikaichan (by Jonathan Zarate)
http://www.polarcloud.com/rikaichan/
Code to get the deinflection rules.

Nini (by Brent R. Matzelle)
http://nini.sourceforge.net/index.php
.Net configuration library.

BASS Audio Library (by un4seen developments)
http://www.un4seen.com/bass.html
Audio library.

Kanji Stroke Order Font (Ulrich Apel, the AAAA project and the Wadoku project)
http://www.nihilist.org.uk/

Iconic (by P.J. Onori)
http://somerandomdude.com/work/iconic/
Font-based icon set.
Edited: 2013-03-03, 2:02 am
Reply
#10
This is a great tool, love the support for EPWING dictionaries and the blacklist feature.

Unfortunately I've been getting this error when trying to copy text from the definitions or use the right-click function:

このページのスクリプトでエーラーが発生しました。

ライン: 131
文字: 5
エラー: ライルまたはアセンブリ 'Microsoft.mshtml. Version=7.0.3300.0. Culture=neutral. PublicKeyToken=b03f5f7f115d50a3a'、またはその依存関係の1つが読み込めませんでした。指定されたファイルが見つかりません。
コード: 0
URL: about:blank

このページのスクリプトを実行し続けますか?

I'm running Windows 7 Ultimate x64 with both .NET Framework 3.5 and 4 installed. Is there a specific reason why it's trying to use mshtml?
Edited: 2013-02-13, 10:44 pm
Reply
#11
At0m5k Wrote:This is a great tool, love the support for EPWING dictionaries and the blacklist feature.

Unfortunately I've been getting this error when trying to copy text from the definitions or use the right-click function:

このページのスクリプトでエーラーが発生しました。

ライン: 131
文字: 5
エラー: ライルまたはアセンブリ 'Microsoft.mshtml. Version=7.0.3300.0. Culture=neutral. PublicKeyToken=b03f5f7f115d50a3a'、またはその依存関係の1つが読み込めませんでした。指定されたファイルが見つかりません。
コード: 0
URL: about:blank

このページのスクリプトを実行し続けますか?

I'm running Windows 7 Ultimate x64 with both .NET Framework 3.5 and 4 installed. Is there a specific reason why it's trying to use mshtml?
MSHTML is used by the WebBrowser control which I used for the ruby pane, gloss pane, and kanji pane.

I (wrongly) assumed that it was a standard Windows component.

I uploaded mshtml here (MediaFire link). Unzip it to the same directory as JGlossator.exe. Did it fix the problem? If so, I'll distribute it in future versions.

Edit:
If the above version of mshtml doesn't work, try this version. It's signed a little differently.
Edited: 2013-02-14, 12:37 am
Reply
#12
Awesome thanks for the quick reply. The first mshtml file did the trick. Everything is working just fine now. I tried the alternate version as well but it gave a similar error.
Reply
#13
I have just released version 2.0 of JGlossator.

Download JGlossator v2.0 via SourceForge

What Changed?

● Added the Grammar Pane which includes a CaboCha dependency tree, jCorrect tips, and Obi-2 grade level. To enable: Options -> Appearance -> Show the grammar pane.

● Added the Kanji Info dialog. It contains a stroke order diagram, list of Heisig/community primitives, radicals, etc. Click on one of the large kanji in the Kanji Pane to display it.

● Updated the pitch accent database to match Rikaisama.

cb4960
Edited: 2013-03-02, 11:31 pm
Reply
#14
How "effective" do you think this Obi-2 grade level is when dealing with simple one-line sentences?

For example, this sentence is treated as 13, beyond high school:
恋人の死の知らせに彼女は大いに心を乱した。

The Hayashi score is 95 (easy), which seems legit, unlike the Obi-2 result.

Perhaps the Hayashi method is better for simple sentences, and Obi-2 for analysis of large texts such as essays or books?
Edited: 2013-03-04, 4:50 am
Reply
#15
This is the ultimate japanese learning tool.
Amazing job, cb4960! Thank you for the time you spent (and spend) on this one!
Reply
#16
toshiromiballza,
I'll look into adding Hayashi in a future version.

Tyreon,
You're welcome.
Reply
#17
I have just released version 3.0 of JGlossator.

Download JGlossator v3.0 via SourceForge

What Changed?

● Added word frequency. For a particular word, frequency is in terms of the number of words that are more frequent than it.

Example:
学生 would return 1175, meaning that there are 1175 words more frequent than 学生.

Frequency numbers are colored green-ish, yellow-ish, orange-ish, or red-ish to indicate very common [0 - 5,000], common [5,001 - 10,000], uncommon [10,000 - 20,000], and rare [20,001 - infinity] words. Ranges can be modified in the options.

Frequency information comes from analysis of 5000+ novels (via Japanese Text Analysis Tool). Frequency of words in other mediums, such as newspapers, might vary. Not all words have frequency information. It is possible for multiple words to have the same frequency.

● Added option to indicate (via an asterisk) if a word is present in your Anki vocabulary deck. Both Anki 1 and Anki 2 decks are supported. See the "Gloss 2" tab in the options dialog to enable.

● Updated EDICT J-E dictionary, names dictionary, pitch accent database, and Tatoeba example sentences database.

● Updated Mecab to v0.996.

cb4960
Edited: 2013-05-07, 11:55 pm
Reply
#18
cb4960, thank you for another absolutely wonderful tool. Using it is like having a Japanese teacher sitting next to you all the time Smile

Did you consider making an app like LWT or FLTR? I love the concept they came up with, but the actual tools are simply horrible for Japanese.
Reply
#19
I have just released version 4.0 of JGlossator.

Download JGlossator v4.0 via SourceForge

What Changed?

Added definition search. If the search text contains only English, the EDICT definitions will be searched instead of performing the normal gloss. Entries will be sorted by frequency. To search for whole words only, add "w/" in front or add a trailing space (example: "w/experiment" or "experiment "). To perform a regular expression search, add "r/" in front (example: "r/exp\w*?l"). Screenshot:

[Image: def_search.png]

You can right-click on the entries as usual to show entries from other dictionaries, save, etc.

Added kanji search. Allows you to search for kanji based on meaning, RTK primitives, radical meanings, ON readings, or KUN readings. In the screenshot below, I am searching for all kanji containing the RTK primitives "person" and "ten":

[Image: kanji_search.png]

The kanji are sorted based on number of strokes and then by frequency.

Search Based On Format
Meanings km/<comma-separated list of meanings>
RTK Primitives kp/<comma-separated list of RTK primitives>
Radical meanings kr/<comma-separated list of radical meanings>
ON readings ko/<comma-separated list of ON readings>
KUN readings kk/<comma-separated list of KUN readings>

● Added Frequency save token (%f).

● Added "Play Audio" and "Show Examples" checkbox to Options context menu.

● Updated frequency database to include words not in EDICT.

● Fixed bug that prevented frequency lookup of katakana words.

● If frequency was found based on reading add "_r" to the frequency.

● Use "*r" if word is found in Anki deck based on reading.

● ASCII numbers in expressions are now converted to Japanese numbers to facilitate pitch/audio/frequency lookups.

● Modified some of the frequency colors to match Rikaisama.

cb4960
Edited: 2013-06-22, 9:38 am
Reply
#20
Help! The program doesn't work on my computer (win 7 sp1 x64,JGlossatorv4.0). When I start JGlossator this message comes out [Image: ycz.png]
Log file

I think it has something to do with the fact that net framework 3.5.1 is built-in in the Win7 sp1. And it seems that I can't delete it and install 3.5 instead =(
Reply
#21
The .Net assemblies that you're using look the same as mine so you probably have a compatible version of the .Net framework. JNovelFormatter has similar requirements, can you get that to run?

The only way I was able to reproduce the issue was to mangle the InputTextBoxSize setting in themes.ini. It should be something like "InputTextBoxSize = 14.0", but something like "InputTextBoxSize = 14.0hellomom" will cause it to fail. I don't suppose you did something like that? If not, did you changed any of the default settings?
Edited: 2013-07-08, 8:13 pm
Reply
#22
JNovelFormatter works without a problem.
Quote:I don't suppose you did something like that? If not, did you changed any of the default settings?
No, of course not. I've tried to run all previous versions, all the same. I also tried to run it on another machine (although similar config) - same thing.
What else.. I've played with permissions setting; tried to delete settings file, to generate new one; tried to run in safe mode; tried to reinstall framework and delete the v.4 one and some other weird voodoo stuff *-*
Edited: 2013-07-09, 5:36 am
Reply
#23
I did it! The reason behind this was region format settings (Region and language - Formats). I changed it from Russian to English and it worked. It seems that the program tries to input time data or something in specific region format and fails.
And JGlossator looks great so far! Straight of I have suggestion - support of text-to-speech software to voice whole copied text would be supercool.
Edited: 2013-07-09, 7:02 am
Reply
#24
Reddeath Wrote:I did it! The reason behind this was region format settings (Region and language - Formats). I changed it from Russian to English and it worked. It seems that the program tries to input time data or something in specific region format and fails.
The default decimal separator for Russian is ',' instead of '.' so it probably choked on that.

Fun fact: some of your applications (and websites, depending on browser settings) that were previously in Russian will now be in English. If you change the time format to Japanese, they'll be in Japanese (if you have the relevant localization files for them. If not, they'll likely default to English). The alphabetical sorting of filenames with cyrillic letters may or may not be messed up.
Edited: 2013-07-09, 7:35 am
Reply
#25
Vempele Wrote:The default decimal separator for Russian is ',' instead of '.' so it probably choked on that.
The default is ',' on my system as well, and it works normally.
Reply