How many readings do you need to know?

Index » The Japanese language

 
Reply #126 - 2010 February 24, 1:07 pm
nest0r Member
Registered: 2007-10-19 Posts: 5236 Website

@rich_f

I've been using UltraEdit for so long I don't know if I could change it. It seems to be able to handle lots of stuff. Got a script I can copy/paste? ;p

Last edited by nest0r (2010 February 24, 1:08 pm)

Reply #127 - 2010 February 24, 1:35 pm
rich_f Member
From: north carolina Registered: 2007-07-12 Posts: 1708

@nest0r

Nope, not yet. I still have to read the 47-page tutorial on regular expressions, and that's going to have to wait for a few days while I take care of Other Stuff. I've got a JLPT review class to prep for (I'm enjoying it), and an article to write.

EditPadLite has a functioning regular expression search box, and it should have the same PDF if you're really impatient. XD

Reply #128 - 2010 February 24, 1:43 pm
nest0r Member
Registered: 2007-10-19 Posts: 5236 Website

Funny that you mention 'functioning' regex search box, because mine with Ultraedit is nonfunctioning. Looks like EditPadPro can handle Shift-JIS too, while I can't get it on Ultraedit. Hmm.

My basic idea at the moment is Find:

kanji variable string thingy《kana variable string thingy》

and Replace with

<ruby><rb>same kanji as above</rb><rp>(</rp><rt>same kana as above</rt><rp>)</rp></ruby>

... That's assuming what I pasted from a rubified xhtml page would then work if I opened the file in Firefox. Also assuming I don't need to add line breaks if I need to save and open as html rather than click the txt and 'open with' Firefox.

Last edited by nest0r (2010 February 24, 1:48 pm)

Advertising (register and sign in to hide this)
JapanesePod101 Sponsor
 
Reply #129 - 2010 February 24, 2:31 pm
pm215 Member
From: UK Registered: 2008-01-26 Posts: 1354

It is, as they say, not *quite* that simple, since you also need to deal with the case where there is a start-of-ruby marker, so you probably want to replace those first to get them out of the way so they don't end up as false positives for the main replace.

Reply #130 - 2010 February 24, 2:58 pm
rich_f Member
From: north carolina Registered: 2007-07-12 Posts: 1708

Yeah, there's that too.

And you could also save yourself a lot of headache by doing a global search for EOLs and replace them with </p>, newline, <p>, so you have paragraphs formatted properly. Just have to add the one initial <p> at the top of the doc.

You can figure out how to build a regex for finding EOLs by reading the tutorial, I think. He says that the regex he uses in there is based on PERL's handling of regex, if I remember correctly.

Reply #131 - 2010 February 24, 5:07 pm
nest0r Member
Registered: 2007-10-19 Posts: 5236 Website

Maybe I can just get used to reading furigana inline/to the right instead of superscripted. ;p

Reply #132 - 2010 February 24, 5:58 pm
nest0r Member
Registered: 2007-10-19 Posts: 5236 Website

So I clicked the .py file and an empty black window came up with a flashing cursor. Make it work. Also, that other file doesn't have an extension, what's it for. /n00b

Last edited by nest0r (2010 February 24, 5:59 pm)

Reply #133 - 2010 February 24, 6:37 pm
yudantaiteki Member
Registered: 2009-10-03 Posts: 3619

Thora wrote:

If I'm not mistaken, I think Kanji Jiten colour codes by level (elementary/middle/high) the character itself is introduced and level at which other readings are later added. It also gives a sample common word. But it's not in a list form, afaik.

It's a good site for all kinds of kanji info: New Jouyou list updates, history of name kanji changes, difficult readings, categorized by topic, radical info, kanken 1&2 lists?, etc.

Sometimes I wonder where the Kanji Kentei gets their list of readings.  I looked at the first three kanji in the "a" today in the library to see what I got (亜, 唖, and 娃).  First of all, 唖 is an invention of the JIS writers that doesn't actually appear in any dictionaries I looked at (I assume it's in the JIS dictionary, though).  Every dictionary has 口 + 亞 rather than 唖.  Minor point, though.

The other problem was that I could not find the reading わら(う) for 唖 in either the 大漢和辞典 or the 日本国語大辞典.  Morohashi had a *meaning* わらう, but the only non-dictionary quotation in the definition was 禹乃唖然而笑.  I'm pretty sure this is 唖然(あぜん) (my rather limited 訓読 ability would read this as 禹,スナワチ唖然トシテ笑フ)

うつくしい for 娃 and つぐ for 亜 were both feebly attested, but they're probably not worth actively learning.

Not sure what the point of this is, I guess just a caution for using lists of readings from even the kanji kentei, and reinforcing the idea that above a certain level, you should be learning from actual sources and not lists.

Last edited by yudantaiteki (2011 April 07, 9:59 pm)

Reply #134 - 2010 February 24, 7:06 pm
nest0r Member
Registered: 2007-10-19 Posts: 5236 Website

Found a perl script that supposedly does something similar: http://dotclue.org/archives/002990.html

I installed Perl but I don't know how to run those scripts either. I hate computer people. I love them and hate them.

Reply #135 - 2010 February 24, 7:19 pm
balloonguy Member
Registered: 2007-05-06 Posts: 54

nest0r wrote:

Or that will turn "kanji《kana reading》" into a phonetic guide in Word, or something?

This macro should work. I've only tested it on two documents, so there may be bugs. To get it to work in Word '03:
1)open up the text file you want to add rubi text.
2)Under tools->macro click Visual Basic Editor.
3)In the resulting window, click file->import file and select the posted file. Close the editor window.
4)Then again under tools->macro click macros, select rubier and click run.

Reply #136 - 2010 February 24, 7:27 pm
nest0r Member
Registered: 2007-10-19 Posts: 5236 Website

balloonguy wrote:

nest0r wrote:

Or that will turn "kanji《kana reading》" into a phonetic guide in Word, or something?

This macro should work. I've only tested it on two documents, so there may be bugs. To get it to work in Word '03:
1)open up the text file you want to add rubi text.
2)Under tools->macro click Visual Basic Editor.
3)In the resulting window, click file->import file and select the posted file. Close the editor window.
4)Then again under tools->macro click macros, select rubier and click run.

You have no idea how much I love you right now. (Translation: Works a treat, took mere seconds, will continue testing it out.)

In case my gratitude didn't come across: THANK YOU. You're a mensch.

Now for the t-time files, converting <t-rb> or <t-r> before viewing as .html -- Oh screw it, I only have ~20 of that type.

Last edited by nest0r (2010 February 24, 9:18 pm)

Reply #137 - 2010 February 25, 2:51 am
rich_f Member
From: north carolina Registered: 2007-07-12 Posts: 1708

I also had success with some simple regular expressions in EditPad. (You can do this in Lite or Pro. I did it in Pro.) You need to do 4 search and replaces. Turn on the Regular Expression switch in the panel, then:

Copy the first pipe-looking thing (it's *not* a pipe) and paste it into the find box, then in the replace box, type: <ruby><rb> Replace all.

Copy the first << looking thing, paste, replace all with </rb><rt>

Copy/replace the >> with </rt></ruby>

Now the fun part: enter the following into the find box:
\r\n
And enter this into the replace box:
</p>\r\n<p>

It's not ideal, but this last line will replace all newlines with a </p>, start a new line, then add a <p>. You'll have to remove one from the top of the doc, and one from the bottom.

Then just slap on the header.html and footer.html from iSoron's site.

It's not as fast as the macro, but if you don't have Word, it's a viable alternative.

You could just load up a bunch of txt files all at once, run step one on a crapton of them, then step 2, etc...

Last edited by rich_f (2010 February 25, 2:53 am)

Reply #138 - 2010 February 25, 3:38 am
nest0r Member
Registered: 2007-10-19 Posts: 5236 Website

面白い。I'd noticed someone (via Google) mentioning that pipe (I mean, 'ceci n'est pas une pipe') as a marker for which kanji the furigana applies to, but after the initial example occurrence, is it invisible in the text, or something? Same with the \r\n stuff. At any rate, perhaps some variation of this could be applied to t-time files... though now that I have 99% of the texts taken care of, I'm good, can finally stop obsessing. ^_^

Reply #139 - 2010 February 25, 6:40 am
pm215 Member
From: UK Registered: 2008-01-26 Posts: 1354

nest0r wrote:

So I clicked the .py file and an empty black window came up with a flashing cursor. Make it work. Also, that other file doesn't have an extension, what's it for. /n00b

This is all basically a culture issue. Unix traditional culture is command line based : you type commands, you don't click on things. In particular, this python script is expecting you to type 'name-of-script inputfile outputfile' or something similar. Only that won't quite work on Windows because scripting like this is kind of alien and not very well grafted in, so you probably have to say 'python name-of-script infile outfile'.

Also the input files need to be UTF-8, which is the One True Character Encoding. (You may find your files are in UCS-2, in which case you'll need to fix that first.)

The file with no extension is a shell script, which is completely standard on unix but useless on Windows. That there is no extension is also a culture thing -- typically on unix you don't put extensions on command names, there's no point because the computer will correctly deal with whatever kind of script or binary it happens to be. Windows requiring '.exe' or '.bat' or whatever is ugly... (The shell script in this case isn't doing anything particularly important though, mostly it's just tacking a header and footer on the file.)

So, er, just saying that the reason this kind of thing is easy on unix is that there's a huge weight of tradition and infrastructure there that makes it easy. Windows doesn't have that, which is why taking scripts that work fine on Linux and making them work on Windows is sometimes tedious...

Reply #140 - 2010 February 25, 9:52 am
rich_f Member
From: north carolina Registered: 2007-07-12 Posts: 1708

In regex, \r stands for a CR, and \n stands for a LF. When you hit Enter/Return on a line of text in Windows, you generate a CR and an LF. In Unix, you just generate an LF. So all you're doing here is searching for that CR+LF at the end of paragraphs, putting in the close paragraph html tag, throwing the CR+LF back in, then adding an open paragraph tag at the beginning of the new line. It means you have to clean things up a little at the top and bottom of the doc, but that's the best I could think of.

As for the python scripts, yeah, that's what I was trying with OS X, but I wasn't having any sort of luck with that. I had the latest version of Python 2 loaded, and it still gacked. (Is it a Python 3 script or something?) (Do you have to create the output file before you run the script, or can it create the file if the file you specify for output doesn't exist?)

Cygwin will let you run bash scripts on Windows, but I don't particularly want to install a big chunk of stuff like that just to run one bash script.

nest0r Member
Registered: 2007-10-19 Posts: 5236 Website

@pm215: Ahhh, Linux. I mean I clicked it, why else wouldn't it do stuff for me? Stupid Linux. ;p And UTF-8, I think everybody except for Aozora types are well acquainted with UTF-8.

@rich_f: I had tried to run that code with my broken UltraEdit regex stuff again, guess that's why it told me '\r\n' was not found or somesuch. ;p There's still the problem of the pipe-thingy being only on certain multi-kanji ruby or whatever.

Last edited by nest0r (2010 February 25, 10:20 am)

Asriel Member
From: 東京 Registered: 2008-02-26 Posts: 1343

So, what exactly is going on with this whole script-running fiasco?
Last I checked people were running analysis of vocab/kanji/etc...

Now we're making furigana out of ‹‹these guys›› ??

Isn't that what...Rikaichan is supposed to help out with?

nest0r Member
Registered: 2007-10-19 Posts: 5236 Website

Asriel wrote:

So, what exactly is going on with this whole script-running fiasco?
Last I checked people were running analysis of vocab/kanji/etc...

Now we're making furigana out of ‹‹these guys›› ??

Isn't that what...Rikaichan is supposed to help out with?

For me personally, I use Stardict as it works in every program, for the record. But my main goal before balloonguy saved me was to find a way to make a smooth reading experience the way it was intended with the formatting (and presumably why furigana first was used in Japanese?) in various readers, none of which worked for me. I would, as I said, have simply removed it entirely for that streamlined experience, but in some cases I imagine certain authorial choices/names would have used different readings. In those cases I may have come across them amidst one of several study strategies (see other thread where I ramble to blackmacros), but...

At this point we're good with both Linux and PCs using Word, it seems.

I posted that jpzin (Chinese site?) link above which mentioned smoopy and other readers, but here's an older site where I first learned of various readers: http://www.sky.sannet.ne.jp/at-sushi/aozora/viewer.html

pm215 Member
From: UK Registered: 2008-01-26 Posts: 1354

Asriel wrote:

Now we're making furigana out of ‹‹these guys›› ??

Isn't that what...Rikaichan is supposed to help out with?

Rikaichan will only look up kanji in its dictionary and make a best-guess about readings. Typically in texts marked up with ruby markings like this the markup gives you exactly the furigana that were used in the original printed text. So for example you get to see which choice the author made when there were several possible readings, you get indication of idiosyncratic readings (eg where kanji are used to indicate meaning with furigana of some loanword giving intended pronunciation), you see which words were felt to need furigana and which were left without, and so on. It's just that the aozora markup is ugly (but unavoidable in a basically plain-text situation), so converting it to HTML ruby markup gives you a nicer display.