Script to grab japanese definitions for wordlists

Index » Learning resources

  • 1
 
Reply #1 - 2009 May 26, 11:18 am
Tobberoth Member
From: Sweden Registered: 2008-08-25 Posts: 3364

With the release of mentat_kgs awesome ruby script for ripping tbs news, I thought I'd create a simple script myself.

Most people here read tons of Japanese stuff, and we all know how boring it is to have a dictionary by your side and constantly look words up. Some people get around this by simply reading and then noting each new word in a list. This list can then later be used for mining.

What this script does is take a simple list of Japanese words, search dic.yahoo.co.jp for definitions and example sentences then put together a file with this information. It works with both Daijisen and Daijirin.

Like mentats script, you need ruby and hpricot to use it.

http://rapidshare.com/files/237794913/grabdef.zip

It's very easy to use. Simply put the script in some folder and put a textfile with a list of words in the same folder. Use the terminal to run the command, supplying input and outout as arguments. Example:
In a folder you have grabdef.rb. You save input.txt in the same folder, it contains this:
言葉
翌日
探す
You use the terminal and run "ruby grabdef.rb input.txt output.txt". After it's completed, you'll have a new file, output.txt which contains something like
言葉
1 日本語の定義はここぞ.

etc.

I've only tried the script on Linux, hopefully it works on windows and OS X as well. The input file must be saved as UTF-8. One does not have to supply the output filename, if it's not there, the script will automatically save the definitions as "output.txt" overwriting any such file in the folder.

Final note, it searches daijisen by default. if you want to use daijirin instead, open grabdef.rb in a text editor and change
DNAME         = "&dname=#{DAIJISEN}"
to
DNAME         = "&dname=#{DAIJIRIN}"

I know there are probably websites and maybe plugins to Anki which do more or less the same thing, but this saves me time at least. Let me know if you find it useful. Please let me know if you find any glitches and bugs as well, especially if you know Ruby and have suggestions wink. I know the output isn't very pretty atm, I might rework the script to create prettier output files later.

EDIT: Has edited the script to work on Windows and added a simple Readme.

Last edited by Tobberoth (2009 May 27, 8:40 am)

Reply #2 - 2009 May 26, 12:20 pm
nac_est Member
From: Italy Registered: 2006-12-12 Posts: 617 Website

It works! Thanks a lot, it's another great idea.
I'm thinking about how to use it in a smart way. It could be very useful for the times when I want to put a definition on a card. But I input one card at a time, so it wouldn't be efficient that way. I could instead type all the sentences on a text document beforehand, then locate all the words I want to look up, then look them up with your nice script and finally enter them quickly into Anki.
How do you do it?

Reply #3 - 2009 May 26, 3:52 pm
ahibba Member
Registered: 2008-09-04 Posts: 528 Website

That's great.

Anyone tried it (and the other script of mentat_kgs) on Windows?

It would be more helpful if it was an Anki plugin.

Advertising (register and sign in to hide this)
JapanesePod101 Sponsor
 
Reply #4 - 2009 May 26, 4:03 pm
Tobberoth Member
From: Sweden Registered: 2008-08-25 Posts: 3364

While I agree it might have been easier to use as a plugin to Anki, we should remember that not everyone uses Anki. A plugin for Anki would only work there, this script works everywhere.

As for windows, you should really have no problem with my script since it doesn't actually work with any commands, it's all pure ruby. The problems that can come up are two-fold:
1. The input needs to be UTF-8, notepad kinda sucks there. As long as you have a decent text editor (e text editor, notepad++ or SciTE for example) it shouldn't be a problem.
2. You need ruby-gems installed and then, using that, install hpricot. Shouldn't be any harder to install on windows than on linux, but I haven't tried it myself.

Last edited by Tobberoth (2009 May 26, 4:04 pm)

Reply #5 - 2009 May 26, 5:17 pm
ahibba Member
Registered: 2008-09-04 Posts: 528 Website

Thank you for explanation.

1. I don't have problems with UTF-8. I have Win32Pad, EditPad Pro and other advanced text editors.

2. This is why I asked if someone has tried it. Because I don't have these things. I know that Ruby is a programming language, but I never used it. I'll search for it and hpricot.

Reply #6 - 2009 May 26, 5:25 pm
Tobberoth Member
From: Sweden Registered: 2008-08-25 Posts: 3364

ahibba wrote:

Thank you for explanation.

1. I don't have problems with UTF-8. I have Win32Pad, EditPad Pro and other advanced text editors.

2. This is why I asked if someone has tried it. Because I don't have these things. I know that Ruby is a programming language, but I never used it. I'll search for it and hpricot.

Don't worry about the installation, Ruby is very easy in this regard. First, go to www.ruby-lang.org and get the Ruby one-click installer. As you hear from the name, it's really easy to install.

Once installed, if you know how, you should probably add the PATH to the ruby executable to your path (if the one-click installer doesn't do that for you, which is possible). This does so that when you're in command-line mode (Start -> Run and type cmd) you can simply write ruby rubyscript.rb to run scripts. Anyways, to get ruby-gems, just go here and download the latest .zip:
http://rubyforge.org/frs/?group_id=126

Unpack the zip, go into that directory using cmd and run "ruby setup.rb". When done, simply run "gem install hpricot" from the cmd and it should automatically download and install hpricot for you.

Once done, you can run my script just fine. mentats script will work as well (but you need to get mencoder for that one to work, instructions in his thread.)

Last edited by Tobberoth (2009 May 26, 5:26 pm)

Reply #7 - 2009 May 27, 5:52 am
superdry New member
Registered: 2006-04-02 Posts: 4

not working for me on windows.  I'm only getting jibberish in the output file

Reply #8 - 2009 May 27, 6:16 am
mentat_kgs Member
From: Brasil Registered: 2008-04-18 Posts: 1671 Website

To work on windows you need to change the line 55
from

File.open(save_file, 'w') do |file|
to
File.open(save_file, 'wb') do |file|

Reply #9 - 2009 May 27, 6:43 am
cangy Member
From: 平安京 Registered: 2006-12-13 Posts: 372 Website

nice idea.  I was thinking of doing something similar for studying sentences, feeding them into http://www.csse.monash.edu.au/~jwb/cgi- … dic.cgi?9T

Reply #10 - 2009 May 27, 8:38 am
Tobberoth Member
From: Sweden Registered: 2008-08-25 Posts: 3364

mentat_kgs wrote:

To work on windows you need to change the line 55
from

File.open(save_file, 'w') do |file|
to
File.open(save_file, 'wb') do |file|

Ah, I'll be darned, it has to write the kanji as binary? Well, I'll fix it and upload a new version right away.

  • 1