![]() |
|
cb's JNovel Formatter - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: Learning resources (http://forum.koohii.com/forum-9.html) +--- Thread: cb's JNovel Formatter (/thread-7229.html) |
cb's JNovel Formatter - nest0r - 2011-02-09 pm215 Wrote:Actually that makes me more confident in that single find/replace regex above.nest0r Wrote:Here's what I meant. You'll see this on most (all?) Aozora texts and I think that goes for hypothetical novels in .txt format.Yeah. See the aozora docs section (5), talks about ruby markup and has some examples you can check your implementation on. Basically the "full" form is 武州|青梅《おうめ》の宿 where the bar says where the ruby starts and the ruby itself is in the angle brackets. But the bar can be omitted if it would happen at the first boundary between "character classes" before the open angle bracket (where character class = kanji, kana, alphabet, etc.), which happens most of the time. If that mark (does it have a name? Other than stick thingy) actually points out where there is a string of kanji across multiple words, thus allowing for notation that prevents overlap of the readings, and when it doesn't occur that means there's non-kanji interrupting the string, then that means the range can be increased or unlimited, no? Like 1,6 or something. Because it's referring to the CJK Unified Ideographs (and whatever else one might think should be used)... so, for the majority of instances, which are unmarked by the stick thingy, even if the regex contains a higher upper limit (like 6) than there are kanji in a given instance, the placement of the furigana will stop at the end of the previous word which won't contain the CJK Unified Ideographs, and thus will always be placed over the first character of the relevant word which immediately precedes the reading brackets. And by adding the stick thingy at the onset, having it at the beginning as I have it (assuming it's not working for me as I described because of a fluke) sets the limit for when there are those multiple kanji/unbroken character classes in the CJK Unified Ideograph range, so in other words it's a fail-safe for when the numerical range of 6 or infinity is stymied by a string of kanji that might otherwise have caused improper placement of readings over the end of a previous word rather than the beginning of the correct word. cb's JNovel Formatter - cb4960 - 2011-02-09 I converted iSoron's/pm215's lovely python script into C#. So far I've only tested the cases in the aozora link: 武州|青梅《おうめ》の宿 耳まで火照《ほて》って すると稍々《やや》度を失った 兄きのような Fanatiker《ファナチイケル》 とは “Kosinski《コジンスキイ》 soll《ゾル》 leben《レエベン》 !” そんな|お伽話《フェヤリー・ストーリース》は、 霧の|ロンドン警視庁《スコットランドヤード》… いいか|釜右ヱ門《かまえもん》。 彼は| Au revoir《さらば》 と、 It successfully ruby tagged each case: 武州<ruby><rb>青梅</rb><rt>おうめ</rt></ruby>の宿 耳まで<ruby><rb>火照</rb><rt>ほて</rt></ruby>って すると<ruby><rb>稍々</rb><rt>やや</rt></ruby>度を失った 兄きのような<ruby><rb> Fanatiker</rb><rt>ファナチイケル</rt></ruby> とは<ruby><rb> “Kosinski</rb><rt>コジンスキイ</rt></ruby><ruby><rb> soll</rb><rt>ゾル</rt></ruby><ruby><rb> leben</rb><rt>レエベン</rt></ruby> !” そんな<ruby><rb>お伽話</rb><rt>フェヤリー・ストーリース</rt></ruby>は、 霧の<ruby><rb>ロンドン警視庁</rb><rt>スコットランドヤード</rt></ruby>… いいか<ruby><rb>釜右ヱ門</rb><rt>かまえもん</rt></ruby>。 彼は<ruby><rb> Au revoir</rb><rt>さらば</rt></ruby> と、 The "<ruby><rb>" after the "とは" in the 4th case was the only thing that seemed slightly off. It seems to have skipped past the line break. I might want to fix that. I'm going to move out the release a day or two so that I can have more time to test on actual novels. I also want to look into nest0r's concerns in the previous post. BTW, after a couple of minor modifications, it seems to run well on Linux Mint too. cb's JNovel Formatter - nest0r - 2011-02-09 If you're using a proper script made by someone who knows what they're doing, I'm sure that'll make more sense and work better than whatever I was rambling about. ;p Edit: And for the record I changed the range so it's {1,8}—infinity works also, I bet, not that it matters now. It doesn't cover the alphabetic stuff the way iSoron's does, either. You programmers think you're so smart. I'm taking my toys and going home. Hmmph. ;pBy the way, there are also instances in some texts where there's (I guess) kana iteration marks: https://secure.wikimedia.org/wikipedia/en/wiki/Iteration_mark#Kana So it looks like: |かな《 ゝゝ》 or was it ヽヽ. Something like that. Google for 《ヽヽ》 to see examples. @hereticalrants - Good idea. cb's JNovel Formatter - hereticalrants - 2011-02-09 I converted a few of the more difficult books I've been reading for greater ease of dictionary usage. BTW do you have any plans to add support for images? A lot of books use tags like this:[#(img/imagename.jpg)] cb's JNovel Formatter - cb4960 - 2011-02-10 @nestor, Those should be fairy easily to implement. You can find more info near the bottom of the aozara docs. @hereticalrants, Image support should also be pretty easy. Thanks for the suggestions. cb's JNovel Formatter - nest0r - 2011-02-10 cb4960 Wrote:@nestor,That's in Japanese, though! How am I supposed to read that gibberish? cb's JNovel Formatter - pm215 - 2011-02-10 cb4960 Wrote:兄きのような<ruby><rb> Fanatiker</rb><rt>ファナチイケル</rt></ruby> とは<ruby><rb>I think that strictly speaking the space before "Fanatiker" here should not be inside the <rb></rb> tags, but that error will only have a very minor effect on the result.... cb's JNovel Formatter - dusmar84 - 2011-02-10 Hey guys, sorry for being so dense but what exactly do you mean when you say "specially formatted HTML files." Would you mind elaborating a little more on the benefits of a JNovel Formatter for the n00bs out there? Thanks cb's JNovel Formatter - pm215 - 2011-02-10 nest0r Wrote:It doesn't cover the alphabetic stuff the way iSoron's does, either.Yes, it's details like getting furigana over alphabetics or furigana over kana that I was expecting to be hard to handle with regexes. Fine if you're just after a quick hack you can use in your editor, you're only doing a few documents and you're prepared to manually fix up any errors, but my guess is that getting a regex solution for this problem from 99% right to 100% right would just be incredibly painful. cb's JNovel Formatter - nest0r - 2011-02-10 Oh I forgot to link to this, which amongst other things has a table for which browsers support rubi: http://html5doctor.com/ruby-rt-rp-element/ - The bit about using/not using <rp> is interesting. Also, that table's a little outdated regarding Opera (since it has the HTML Ruby plugin, now). cb's JNovel Formatter - hereticalrants - 2011-02-10 dusmar84 Wrote:Hey guys,Instead of getting loads of lines that go off of the screen and ruby tags next to kanji you get proper website style formatting(page breaks!) and furigana. cb's JNovel Formatter - cb4960 - 2011-02-10 New set of test cases: 武州|青梅《おうめ》の宿 耳まで火照《ほて》って すると稍々《やや》度を失った 兄きのような Fanatiker《ファナチイケル》 とは “Kosinski《コジンスキイ》 soll《ゾル》 leben《レエベン》 !” そんな|お伽話《フェヤリー・ストーリース》は、 霧の|ロンドン警視庁《スコットランドヤード》… いいか|釜右ヱ門《かまえもん》。 彼は| Au revoir《さらば》 と、 胡麻塩おやじ[#「おやじ」に傍点] 胡麻塩おやじ[#「おやじ」に傍点] 胡麻塩おやじ[#「おやじ」に白ゴマ傍点] 胡麻塩おやじ[#「おやじ」に丸傍点] 胡麻塩おやじ[#「おやじ」に白丸傍点] 胡麻塩おやじ[#「おやじ」に黒三角傍点] 胡麻塩おやじ[#「おやじ」に白三角傍点] 胡麻塩おやじ[#「おやじ」に二重丸傍点] 胡麻塩おやじ[#「おやじ」に蛇の目傍点] [#(img/imagename.jpg)] [#(img/imagename.png)] [#(img/imagename.bmp)] [#(img/imagename.gif)] New set of results: 武州<ruby><rb>青梅</rb><rp>《</rp><rt>おうめ</rt><rp>》</rp></ruby>の宿 耳まで<ruby><rb>火照</rb><rp>《</rp><rt>ほて</rt><rp>》</rp></ruby>って すると<ruby><rb>稍々</rb><rp>《</rp><rt>やや</rt><rp>》</rp></ruby>度を失った 兄きのような <ruby><rb>Fanatiker</rb><rp>《</rp><rt>ファナチイケル</rt><rp>》</rp></ruby> とは “<ruby><rb>Kosinski</rb><rp>《</rp><rt>コジンスキイ</rt><rp>》</rp></ruby> <ruby><rb>soll</rb><rp>《</rp><rt>ゾル</rt><rp>》</rp></ruby> <ruby><rb>leben</rb><rp>《</rp><rt>レエベン</rt><rp>》</rp></ruby> !” そんな<ruby><rb>お伽話</rb><rp>《</rp><rt>フェヤリー・ストーリース</rt><rp>》</rp></ruby>は、 霧の<ruby><rb>ロンドン警視庁</rb><rp>《</rp><rt>スコットランドヤード</rt><rp>》</rp></ruby>… いいか<ruby><rb>釜右ヱ門</rb><rp>《</rp><rt>かまえもん</rt><rp>》</rp></ruby>。 彼は<ruby><rb> Au revoir</rb><rp>《</rp><rt>さらば</rt><rp>》</rp></ruby> と、 胡麻塩<ruby><rb>おやじ</rb><rp>《</rp><rt>﹅﹅﹅</rt><rp>》</rp></ruby> 胡麻塩<ruby><rb>おやじ</rb><rp>《</rp><rt>﹅﹅﹅</rt><rp>》</rp></ruby> 胡麻塩<ruby><rb>おやじ</rb><rp>《</rp><rt>﹆﹆﹆</rt><rp>》</rp></ruby> 胡麻塩<ruby><rb>おやじ</rb><rp>《</rp><rt>●●●</rt><rp>》</rp></ruby> 胡麻塩<ruby><rb>おやじ</rb><rp>《</rp><rt>○○○</rt><rp>》</rp></ruby> 胡麻塩<ruby><rb>おやじ</rb><rp>《</rp><rt>▲▲▲</rt><rp>》</rp></ruby> 胡麻塩<ruby><rb>おやじ</rb><rp>《</rp><rt>△△△</rt><rp>》</rp></ruby> 胡麻塩<ruby><rb>おやじ</rb><rp>《</rp><rt>◎◎◎</rt><rp>》</rp></ruby> 胡麻塩<ruby><rb>おやじ</rb><rp>《</rp><rt>◉◉◉</rt><rp>》</rp></ruby> <img src="img/imagename.jpg" /> <img src="img/imagename.png" /> <img src="img/imagename.bmp" /> <img src="img/imagename.gif" /> --- Getting closer! As you can see I fixed the issues with the original test cases and I added support for emphasis markers and images. With all of these ruby tags floating about, it is more critical that I separate pages in a cleaner fashion. That is, not in the middle of a sentence. That's next. Also, can somebody please paste the error message that occurs when selecting a non-TT font? I seem unable to reproduce it on my end. @dusmar84 Now that you've pointed it out, I don't really like that wording. Maybe I'll replace "specially" with "nicely" (or something like that - description suggestions are welcome). cb's JNovel Formatter - nest0r - 2011-02-11 Excellent. I'm glad I happen to have a batch of .txt files with .jpg files to use with this. This is the error: System.ArgumentException: Only TrueType fonts are supported. This is not a TrueType font. By the way, now that Slattery's got this Desktop Player thing (http://forum.koohii.com/showthread.php?tid=7242) and we now have two interactive audiobook tools, I thought the idea (which I mentioned in that thread just recently) of bookmarking you have going here could be used so that people could like, bookmark the spot in the audio + highlighted text that occurrs in Kage Shibari and Desktop Player. That kind of integration is just something to keep in mind I guess, to have that kind of formatting.... though I think it's something that might need to be taken care on their end I'm mentioning it here. For kicks I ran a .trs file through JNF, renaming it back to .trs, and it plays in balloonguy's tool as normal but without the 。's. Desktop Player won't display any text at all with the same file. Ah well. cb's JNovel Formatter - mafried - 2011-02-11 cb4960, have I told you how awesome you are? No, really, monuments shall be erected in your honor. (Don't let it get to your head, for all our sakes) cb's JNovel Formatter - nest0r - 2011-02-11 Have you seen the Readability plugin? I kind of like how it lets you adjust the margins. (Though the plugin itself takes some work if using with JNF-created files, doesn't quite look right because of font/paragraph changes, ruby, etc.) cb's JNovel Formatter - cb4960 - 2011-02-12 Hello, I have just released version 3.0 of JNovel Formatter. Download JNovel Formatter v3.0 via Google Code Download JNovel Formatter v3.0 Source Code via Google Code What changed? - Added Linux support. - Added aozora ruby support (Example: 武州|青梅《おうめ》の宿) . - Added aozora emphasis support (Example: 胡麻塩おやじ[#「おやじ」に傍点]). - Added aozora image support (Example: [#挿絵(img/imagename.jpg)入る]). - The following aozora format constructs are removed: [#改ページ] [#ここから<<number>>字下げ] [#ここで字下げ終わり] - Program will no longer separate pages in the middle of a sentence. - Margin now affects only left and right of page. - Catches and ignores exception about only TrueType fonts being supported. (Let me know if you still get the error). - The Complete Dialog's button now leads to the directory with the HTML files instead of the directory above that. Edit: Downloads are now hosted on Google Code instead of MediaFire. cb4960 cb's JNovel Formatter - ta12121 - 2011-02-12 mafried Wrote:cb4960, have I told you how awesome you are? No, really, monuments shall be erected in your honor.same here, have I told you how awesome you are? cb's JNovel Formatter - nest0r - 2011-02-12 Amazing! Works very nice. I just tried it with images which displayed as desired (once I stuck them in the right folder, of course). Thanks for your effort as always. I did notice that when I tried certain Japanese fonts (えり字, or stuff from that wazu site, for example), when I selected them in the drop-down, the font remained unchanged as MS PMincho. No problem selecting non-TrueType fonts, though. And of course if I just edit the .html file I can replace MS PMincho with ‘Eriji’, for example. Although that font's not as good as I thought, now that I tried it with a novel. Hard to read and doesn't have all the necessary kanji, which results in blank spaces (sometimes blank spaces with furigana, haha). Also: This is entirely off-topic, but since I mentioned Readability, I should mention that I just noticed it has an autoscroll function (you trigger it and it slowly scrolls, and you can increase speed, like a vertical slideshow of sorts). Kind of neat. I bet people could use that to like, force themselves to read at a certain pace or something. ;p Edit: Oh I'm dense. Just noticed you already have an option to adjust the margins. Stop with the time travel already! cb's JNovel Formatter - cb4960 - 2011-02-12 nest0r Wrote:I did notice that when I tried certain Japanese fonts (えり字, or stuff from that wazu site, for example), when I selected them in the drop-down, the font remained unchanged as MS PMincho.Did the text box to the right of the font button remain unchanged or just the preview window? If it's just the preview window, the program might successfully insert the non-TTF font name into the HTML. cb's JNovel Formatter - nest0r - 2011-02-12 cb4960 Wrote:Both, unfortunately. Hmm, so it's still the TrueType thing.nest0r Wrote:I did notice that when I tried certain Japanese fonts (えり字, or stuff from that wazu site, for example), when I selected them in the drop-down, the font remained unchanged as MS PMincho.Did the text box to the right of the font button remain unchanged or just the preview window? If it's just the preview window, the program might successfully insert the non-TTF font name into the HTML. Edit: I found this about the net framework not allowing (Adobe) OpenType, at least when using certain elements: http://bytes.com/topic/net/answers/124293-usage-open-type-fonts-net#post430288 I think that's old though, they still don't support it? Hmmph. I also found this where it talks about GDI and DirectWrite and Windows 7: http://forums.madcapsoftware.com/viewtopic.php?f=10&t=10675&start=0&st=0&sk=t&sd=a#p61479 cb's JNovel Formatter - hereticalrants - 2011-02-12 cb4960 Wrote:- Added aozora ruby support.This program is now pretty much everything I would want out of an aozora to html formatter. In one of the files I converted it made a huge gap of whitespace, but that's easily scrolled past. I will now attempt to teleport some cookies directly to your work space as thanks. ...nnngggggg... (EDIT: my telekinetic powers didn´t work. sorry. )
cb's JNovel Formatter - Daichi - 2011-02-12 This is pretty neat. cb, you might want to make a master topic for all of your utilities, and have all your tools link to & from the master topic. One of my ebook file has images in normal html tags. They look like this. Code: <img height=600 src="img/000a.jpg">Your tool exports the fine with these images tags, but it would be nice if it could move the images to the output directory also. Anyway, keep up the good work with all your tools. cb's JNovel Formatter - cb4960 - 2011-02-12 nest0r Wrote:Both, unfortunately. Hmm, so it's still the TrueType thing.Would you mind trying this test build when you get the chance? I've changed the way the user selects fonts. Even if the preview can't use a font, it should have no effect on the HTML output files. That is, the HTML output files should use whatever font you entered, TTF or otherwise. Download JNovel Formatter Font Test via MediaFire Thanks! cb's JNovel Formatter - cb4960 - 2011-02-12 @hereticalrants, It's the thought that counts. Also, I might add some sort of option to limit the number of consecutive newlines. @ Daichi, Thanks for the suggestions. cb's JNovel Formatter - nest0r - 2011-02-12 cb4960 Wrote:Yes! That works (well, in order to display them properly I had to use Firefox 4 Portable or Google Chrome, but that's a browser issue). Although a minority of font names don't show up (such as Aoyagi Kouzan or Eriji). Had to type those in. I think that's something to do with those font names being idiosyncratic in some way (it's not a kanji/kana issue, because many fonts with those in the title show up in the drop-down list).nest0r Wrote:Both, unfortunately. Hmm, so it's still the TrueType thing.Would you mind trying this test build when you get the chance? |