Back

OSX → Let's build a Japanese Grammar Dictionary.app

#1
I finally figured out how to build customized dictionaries in OSX that work with the Dictionary application. So, if you love Dictionary.app and think it would be useful to have a grammar database, etc., built into it, let's get together and work on it.
Reply
#2
OSX built in dictionary is my lifesource.
Can you write some notes on how to start building databases and what not? Point to some good reading material on how to familiarize oneself with the dictionary databases?

I've been meaning to do a bit of grammar curating and make lists organized by expressive function/utility. Whether or not I'll ever do it is another matter Big Grin
Reply
#3
Download the Auxiliary Tools for Xcode developers (needs an Apple ID and registration in the Apple developer's program -free-)

OR just get the filesfrom a link in this website.

Basically you will need 4 files (templates bundled with the Xcode samples):
① plist: Meta info about the dictionary.
② css: Styles.
③ xml: The dictionary data.
④ makefile (I made some edits to the makefile to make everything flush, compile, deploy and cleanup in one swoop, available upon request).

Here is a sample of the .XML file of one of the dictionaries bundles I am working on.
Common Japanese Collocations book by Kazuko Shoji
Quote:<?xml version="1.0" encoding="UTF-8"?>
<d:dictionary>
<d:entry id="body_hygiene" d:title="お風呂・湯船">
<d:index d:value="おふろ"/><d:index d:value="風呂"/>
<d:index d:value="おふろ"/><d:index d:value="湯船"/>
<d:index d:value="home"/><d:index d:value="body"/>
<d:index d:value="hygiene"/><d:index d:value="bath"/>

<span class="headword">お風呂・湯船</span>
<span class="hyouki"> HOME </span>
<span class="hinshi">Body and Hygiene</span>

<div class="meaning">
<p>風呂にお湯を入れる「ホテルの部屋に戻ると、すぐにお風呂にお湯を入れた」</p>
<p>風呂のお湯を落とす</p>
<p>風呂を沸かす「帰ったらすぐにお風呂に入りたいから、お風呂を沸かしておいてね」</p>
<p>風呂・湯船につかる「こんな寒い日は、熱いお風呂にゆっくりつかりたい」</p>

</div>
</d:entry>
</d:dictionary>
It looks like this:
[Image: k37hch.jpg]

Notes:
→ I'm using OSX's speech recognition to input the Japanese (and is working flawlessly (really), in fact, it's recognizing my Japanese better than my English, which makes absolutely no sense.)
→ I'm using a simple regex find and replace to write the html, but I'm sure it could be improved to batch build the whole dictionary with stuff already out there or new stuff + group effort.

More Info:
Official Docs.
http://nagpals.com/posts/mac-dictionaries/
http://mac4translators.blogspot.jp/2007/...t-kit.html
Edited: 2013-01-13, 2:20 am
Reply
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions! - Sign up here
JapanesePod101
#4
Nice work.

It seems like maybe writing a python script (or whatever) to take spreadsheets of mined books/example sentences/etc and put them into the xml dictionary format would be the way to go, considering the abundance of mined material that's available.

At first I thought you were going to create a grammar index, so one could look up something like "について" or "ため(に・の)" and then get grammar information/sentences -- basically packing something like jgram.org or into mac's native dictionary format.

What's your ideal goal? Turn that collocations book into a OSX dictionary?
Reply
#5
That was just an example.
kodorakun Wrote:At first I thought you were going to create a grammar index, so one could look up something like "について" or "ため(に・の)" and then get grammar information/sentences -- basically packing something like jgram.org or into mac's native dictionary format.
This is exactly what I am aiming for.
Reply
#6
I made a simple kanji / RTK dictionary:

[Image: kanji_dictionary.png]

It can be downloaded from http://lri.me/upload/Kanji.dictionary.tgz.

Creating and installing an example dictionary:

- Register a free developer account and download the auxiliary tools package from https://developer.apple.com/downloads
- Rename the Dictionary Development Kit directory so that it doesn't contain spaces
- Change DICT_BUILD_TOOL_DIR in the makefile
- cd /path/to/DictionaryDevelopmentKit/project_templates; make && make install

The MyDictionary.xml I used looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<d:dictionary xmlns="http://www.w3.org/1999/xhtml" xmlns:d="http://www.apple.com/DTDs/DictionaryService-1.0.rng">
<d:entry id="kanji_1" d:title="亜">
<d:index d:value="亜"/>
<d:index d:value="Asia"/>
<p><span style="font-size: 1.3em">亜 </span>Asia | rank next, come after, -ous | ア、アシア、つぐ N や、つぎ、つぐ | 1809</p>
</d:entry>
</d:dictionary>

The dictionary name is CFBundleName in the Info.plist and the bundle name is DICT_NAME in the makefile.
Edited: 2013-06-22, 3:48 am
Reply
#7
Looks great ~~~ Smile

Thank you! (bow)

EDIT: Wow, your website is amazing. Bookmarked.
Edited: 2013-01-13, 7:18 am
Reply
#8
Thanks for the kanji dictionary, nice proof of principle.

Could use a little more metadata and editing to get nice multi-line dictionary entries that have things like frequencies, etc.

If you look up "使う" in the Japanese-English native dictionary you can see an example of the type of formatting available (it's pretty sophisticated).

One big question I have is: does anyone know how to do dictionary searches that search for the entered info as any _part_ of an entry? i.e. if I looked up "話" I would get entries like "話す" and "会話" -- the dictionary seems to be limited to "starting position search" only.

Regarding a grammar dictionary it seems we'll need to settle on a format (i.e. a grammar data structure or something like that), then it would just be a matter of effectively combining most known grammar resources into a spreadsheet or the likes, from which a single script could write the fields to the proper xml file and the dictionary could be generated.

I might try to convert the KM2 grammar spreadsheet into a dictionary with the example sentences included, the hard part will probably be deciding what the "title" of the Japanese grammar entries are.

Wanna toss around some formatting ideas?
Reply
#9
I'm using the 'same' format as the 大辞林 built in to Dictionary.app. As for a source, there are many good ones out there, but the one I really like is 日本語表現文型辞典. I actually ended up using Anki to help me automate creating the entry markup Big GrinBig Grin which I append to the main XML file manually but use a batch file to clean everything up, build and deploy everything in one neat step.
Edited: 2013-01-16, 6:25 am
Reply
#10
This is huge for me! I use Dictionary.app's 大辞林 all the time and I love building little tools to interact with my Anki deck. So I made a kanji dictionary for myself using sentences and readings that exist in my deck.

Here's where I am now:

[Image: 2013-01-16zMnPdVkG.png]

... and the script to generate it is on github.

Thank you Marumaru and lauri_ranta!
Edited: 2013-01-16, 11:37 am
Reply
#11
I'm glad then! I also want to thank lauri_ranta!, I've been using her zen-like site to review vocabulary by よみ these couple of days.
Reply
#12
And another dictionary! I was curious actually yesterday how I knew the word 不景気, which my dictionary tells me right away that I got it from Tae Kim:

[Image: 2013-01-16N1yoRRxc.png]

And the script.
Reply
#13
Someone want to port A Dictionary of Basic/Intermediate Japanese Grammar to this? Wink
Reply
#14
日本語表現文型辞典の方がいいと思いますけど。
Reply
#15
Too ignorant, really sorry.

But do these work on Windows 7 ? :/
Reply
#16
I am not sure, but I think they don't.
Reply
#17
Marumaru Wrote:日本語表現文型辞典の方がいいと思いますけど。
マルマルさん、その辞典のCVSかスプレッドシートがありますか?あれば作れるだと思う。。。
Reply
#18
自分で作りましょうよSmile(多分)ないんですから。
でもしばらくかかるんですよね。
Edited: 2013-01-23, 10:52 pm
Reply