"Learning With Texts" software + Japanese?

Index » Learning resources

 
Cacawate
Member
From: California
Registered: 2006-12-07
Posts: 32
Website

I hate to sound like a complete newb here, but can you point me to any resources on understanding how to do this in windows? I've found the kakasi site, but I have no clue how to use this due to programming ignorance. I'll be Googling in the mean time, but any assistance would be much appreciated.

Reply #27 - 2011 October 25, 3:55 pm
Cacawate
Member
From: California
Registered: 2006-12-07
Posts: 32
Website

Ok, so I've fiddled with this for a month coming from absolutely no programming experience and created a spacer that you can use for LWT. You have to have Python installed on your computer (this was made in 2.7). Also, I'm almost done with the meat of a GUI for this, but still have to learn how to turn that into an executable file for windows as well as translate it into a web app. I don't have access to root on my web server, so I may have to learn PHP or something. I really have absolutely no experience with programming. Here's the code. Save it as .py:

#!/usr/bin/python
# -*- coding: <utf-8> -*-
import MeCab
import codecs
import os

inputJP = raw_input("Please input Japanese here: ")
saveJP = codecs.open("pholder.txt", "w", "utf-8").write(inputJP)

read_from = open("pholder.txt", "r").read()
mecab = MeCab.Tagger("-Owakati")
output = mecab.parse(read_from)
print output

text = output
save_to = raw_input("Please name the file you'd like to save it to: ")
write_to = open(save_to, "w").write(text)

os.system("start notepad.exe" + " " + save_to)

Reply #28 - 2011 October 26, 1:58 pm
wccrawford
Member
From: FL US
Registered: 2008-03-28
Posts: 1548

I don't want you to think you're being ignored.  I don't have time to mess with anything right now, but when I do, I'll be looking at using the knowledge you've provided with LWT and seeing how I can make happen.  Thanks for working on this!

Advertising (register and sign in to hide this)
JapanesePod101
Sponsor
 
Reply #29 - 2011 October 28, 3:11 pm
Cacawate
Member
From: California
Registered: 2006-12-07
Posts: 32
Website

Oh, it's no problem. This thread gets good hits on Google for LWT and parsing Japanese, so I'm updating it to help others that may not be forum users.

Actually, I'm not sure how much help it is as I'm still new to this programming business. smile

Reply #30 - 2011 October 28, 5:14 pm
kodorakun
Member
From: Seattle
Registered: 2008-10-15
Posts: 252
Website

Gah, sorry I haven't been tracking this thread, it seems you've worked out most problems for parsing by now, right?

I've you've got mecab installed I don't see why you're going through all the scripting business. I just copy text of interest into a text file "article.txt" and run "mecab -O wakati article.txt -o art.out" and then "art.out" contains the parsed text, which I drop right into LWT.

Anyway, glad to see LWT is getting some attention -- I'm still using it and loving it.

K.

Arcturus_Red
New member
From: India
Registered: 2011-04-22
Posts: 2

Any advice for a simple-minded Windows user?
Typing mecab <arguments> in a Windows command line doesn't do anything (duh). Tried using cacawate-san's script, but python doesn't recognise the mecab module. Tried saving the script in the same folder as the mecab exe's. Didn't help.


EDIT: Just had to add the Mecab folder to the PATH variable. Works perfectly now.

Last edited by Arcturus_Red (2012 January 04, 8:31 am)

khalhern
Member
From: UK
Registered: 2011-04-11
Posts: 33

In which directory do I put the input.txt files? I keep getting "No such file or directory" no matter where I put it sad

Arcturus_Red
New member
From: India
Registered: 2011-04-22
Posts: 2

khalhern wrote:

In which directory do I put the input.txt files? I keep getting "No such file or directory" no matter where I put it sad

C:\users\(your username) should work.

Or you could just cd to whichever directory they're in... Look for a DOS tutorial on google.

digitlhand
Member
From: Los Angeles CA
Registered: 2007-12-04
Posts: 98
Website

Arcturus_Red wrote:

Any advice for a simple-minded Windows user?
Typing mecab <arguments> in a Windows command line doesn't do anything (duh). Tried using cacawate-san's script, but python doesn't recognise the mecab module. Tried saving the script in the same folder as the mecab exe's. Didn't help.


EDIT: Just had to add the Mecab folder to the PATH variable. Works perfectly now.

Where does one find the PATH variable?

khalhern
Member
From: UK
Registered: 2011-04-11
Posts: 33

Hi digitlhand, which OS are you using? In both XP and Windows 7 it's accessible by right-clicking on My Computer and choosing Properties, then:

If it's Windows 7: Click on the link on the right hand side of the properties window that opens that says "Advanced system settings".

Then choose the "Advanced" tab, and at the bottom you'll see "Environment Variables". Click that, and you'll see a list of variables. Find the one that says "Path", and edit it. Add something like: ;C:\MeCab\bin\ (Or where ever you have mecab installed).

You need the preceding ";" to divide paths, so if you ever need to add a new one after the mecab one, you need to start that with a ";" as well.

Hope that helps somehow!

(Also: )
XP: http://support.microsoft.com/kb/310519
Windows 7: http://www.itechtalk.com/thread3595.html



@Arcturus_Red: For some reason, I tried putting the input file in "users" and got no results... then realised for some reason I hadn't cd'd to the directory on DOS (doh!). Thank you SO much for the help - this worked PERFECTLY! I'm so happy big_smile

Last edited by khalhern (2012 February 12, 8:00 pm)

derDeja
New member
Registered: 2010-02-16
Posts: 2

Hello everybody!

I am struggeling with this problem (no spaces in japanese) right now. I read this thread but i cannot figure out, what to do. I have Anki with japanese model working - it inserts furigana as reading. So, do I have mecab? It may sound stupid, but I can't remember, what I installed to get Anki running as it does now - it's quite a while ago. And btw I am working on MacOS.
Please could someone explain, what to do on a Mac to get this Mecab thing running. I dont even know where to look at to figure out if Mecab is already installed.
And: Could you possibly translate the DOS-Command-Line for Mac-Users?
I hope it doesn't sound too stupid! I was searching this forum an google for this problem with LWT and japanese, but since I am no Computer-Pro I guess I am something missing. Hope for you to spotlight on it!

Thank you very much in advance!
Deja

khalhern
Member
From: UK
Registered: 2011-04-11
Posts: 33

Hi derDeja,

Sorry, but I can't help with how to follow these instructions in Mac OS sad

But if you have the Japanese plug-in installed, there should be a directory called "mecab" within the directory that you installed Anki. Inside the "mecab" directory is a /bin/ directory, which contains the mecab.exe file that you need to run to split the file.

An example in windows would be simply to do something like:

Code:

cd C:\Program Files\Anki\mecab\bin

Then inside that directory (bin), you need an input.txt file, which contains Japanese text (saved to UTF-8), then you can run something like this from the DOS-Prompt:

Code:

mecab.exe -O wakati input.txt -o output.txt

So that the title list of commands looks something like:

Code:

C:\>cd C:\Program Files\Anki\mecab\bin
C:\Program Files\Anki\mecab\bin>mecab.exe -O wakati input.txt -o output.txt

So you'll need to research how to a) change directories through your Mac OS commands prompt (I think the "cd" command works?), and b) find out how to run programs with arguments (Again, it might be exactly the same, simply running mecab.exe -O etc etc through the prompt).

Sorry if that doesn't help much but I hope it does!

derDeja
New member
Registered: 2010-02-16
Posts: 2

Dear khalhern!

Thank you very much for your explanations!
I finally found some Unix-File called mecab which should do the job. I tried to find it via "spotlight" before, the macOS-Search-System, but that didn't find anything. But Your hints set me on track! Thank You!
Now I have to find out, how to run this program via terminal - the macOS DOS. Maybe I will figure it out myself, although I am a little afraid to crush my system if I fumble to much inside its heart. So still any hints about running little programs plus arguments in terminal would heartly appreciated.

Kind regards
Deja

Earthlark
Member
From: Japan
Registered: 2008-12-23
Posts: 22

How do you guys deal with separated words such as verb endings, e.g., 行っ て い ます?  I know for the definition you can expand to multiple words, but this doesn't actually bring the word together as far as I've been able to tell.  So, for example, for "ます", do you just hit "ignore" or "well known"?  The same with "て"?

khalhern
Member
From: UK
Registered: 2011-04-11
Posts: 33

Actually it kind of sucks, but I just delete by hand. If it's a long text that probably isn't viable, but if you know the root, you might be able to figure out the ending, I guess, because you can always just search individual kanji.

What I've been doing is just searching separated terms without spaces and so on, and just seeing if I can get some results. If I get ones that make sense, then just click the "Edit Text" button at the top, and remove the spaces wink

I don't know if anyone else has a more productive way to do things, though yikes

Earthlark
Member
From: Japan
Registered: 2008-12-23
Posts: 22

I unchecked the "show all" button and that makes things a bit better, showing the longest saved term.

I guess the best practice may be to just group the different endings: います, いました, etc?

I've just been using the program in conjunction with rikaisama (and EBwin) since it usually indicates the tense of the verb.