Joined: Oct 2007
Posts: 4,582
Thanks:
0
@rich_f
I've been using UltraEdit for so long I don't know if I could change it. It seems to be able to handle lots of stuff. Got a script I can copy/paste? ;p
Edited: 2010-02-24, 2:08 pm
Joined: Jul 2007
Posts: 1,879
Thanks:
19
@nest0r
Nope, not yet. I still have to read the 47-page tutorial on regular expressions, and that's going to have to wait for a few days while I take care of Other Stuff. I've got a JLPT review class to prep for (I'm enjoying it), and an article to write.
EditPadLite has a functioning regular expression search box, and it should have the same PDF if you're really impatient. XD
Joined: Oct 2007
Posts: 4,582
Thanks:
0
Funny that you mention 'functioning' regex search box, because mine with Ultraedit is nonfunctioning. Looks like EditPadPro can handle Shift-JIS too, while I can't get it on Ultraedit. Hmm.
My basic idea at the moment is Find:
kanji variable string thingy《kana variable string thingy》
and Replace with
<ruby><rb>same kanji as above</rb><rp>(</rp><rt>same kana as above</rt><rp>)</rp></ruby>
... That's assuming what I pasted from a rubified xhtml page would then work if I opened the file in Firefox. Also assuming I don't need to add line breaks if I need to save and open as html rather than click the txt and 'open with' Firefox.
Edited: 2010-02-24, 2:48 pm
Joined: Jan 2008
Posts: 1,458
Thanks:
20
It is, as they say, not *quite* that simple, since you also need to deal with the case where there is a start-of-ruby marker, so you probably want to replace those first to get them out of the way so they don't end up as false positives for the main replace.
Joined: Jul 2007
Posts: 1,879
Thanks:
19
Yeah, there's that too.
And you could also save yourself a lot of headache by doing a global search for EOLs and replace them with </p>, newline, <p>, so you have paragraphs formatted properly. Just have to add the one initial <p> at the top of the doc.
You can figure out how to build a regex for finding EOLs by reading the tutorial, I think. He says that the regex he uses in there is based on PERL's handling of regex, if I remember correctly.
Joined: Oct 2007
Posts: 4,582
Thanks:
0
Maybe I can just get used to reading furigana inline/to the right instead of superscripted. ;p
Joined: Oct 2007
Posts: 4,582
Thanks:
0
So I clicked the .py file and an empty black window came up with a flashing cursor. Make it work. Also, that other file doesn't have an extension, what's it for. /n00b
Edited: 2010-02-24, 6:59 pm
Joined: Jul 2007
Posts: 1,879
Thanks:
19
I also had success with some simple regular expressions in EditPad. (You can do this in Lite or Pro. I did it in Pro.) You need to do 4 search and replaces. Turn on the Regular Expression switch in the panel, then:
Copy the first pipe-looking thing (it's *not* a pipe) and paste it into the find box, then in the replace box, type: <ruby><rb> Replace all.
Copy the first << looking thing, paste, replace all with </rb><rt>
Copy/replace the >> with </rt></ruby>
Now the fun part: enter the following into the find box:
\r\n
And enter this into the replace box:
</p>\r\n<p>
It's not ideal, but this last line will replace all newlines with a </p>, start a new line, then add a <p>. You'll have to remove one from the top of the doc, and one from the bottom.
Then just slap on the header.html and footer.html from iSoron's site.
It's not as fast as the macro, but if you don't have Word, it's a viable alternative.
You could just load up a bunch of txt files all at once, run step one on a crapton of them, then step 2, etc...
Edited: 2010-02-25, 3:53 am
Joined: Oct 2007
Posts: 4,582
Thanks:
0
面白い。I'd noticed someone (via Google) mentioning that pipe (I mean, 'ceci n'est pas une pipe') as a marker for which kanji the furigana applies to, but after the initial example occurrence, is it invisible in the text, or something? Same with the \r\n stuff. At any rate, perhaps some variation of this could be applied to t-time files... though now that I have 99% of the texts taken care of, I'm good, can finally stop obsessing. ^_^
Joined: Jul 2007
Posts: 1,879
Thanks:
19
In regex, \r stands for a CR, and \n stands for a LF. When you hit Enter/Return on a line of text in Windows, you generate a CR and an LF. In Unix, you just generate an LF. So all you're doing here is searching for that CR+LF at the end of paragraphs, putting in the close paragraph html tag, throwing the CR+LF back in, then adding an open paragraph tag at the beginning of the new line. It means you have to clean things up a little at the top and bottom of the doc, but that's the best I could think of.
As for the python scripts, yeah, that's what I was trying with OS X, but I wasn't having any sort of luck with that. I had the latest version of Python 2 loaded, and it still gacked. (Is it a Python 3 script or something?) (Do you have to create the output file before you run the script, or can it create the file if the file you specify for output doesn't exist?)
Cygwin will let you run bash scripts on Windows, but I don't particularly want to install a big chunk of stuff like that just to run one bash script.
Joined: Oct 2007
Posts: 4,582
Thanks:
0
@pm215: Ahhh, Linux. I mean I clicked it, why else wouldn't it do stuff for me? Stupid Linux. ;p And UTF-8, I think everybody except for Aozora types are well acquainted with UTF-8.
@rich_f: I had tried to run that code with my broken UltraEdit regex stuff again, guess that's why it told me '\r\n' was not found or somesuch. ;p There's still the problem of the pipe-thingy being only on certain multi-kanji ruby or whatever.
Edited: 2010-02-25, 11:20 am
Joined: Feb 2008
Posts: 1,322
Thanks:
0
So, what exactly is going on with this whole script-running fiasco?
Last I checked people were running analysis of vocab/kanji/etc...
Now we're making furigana out of ‹‹these guys›› ??
Isn't that what...Rikaichan is supposed to help out with?