Back

cb's JNovel Formatter

#76
I think I traced my sudden ability to use the layout settings to first running 32-bit version of 7-zip w/ MS AppLocale to unzip the txtmiru2.zip file, and then running the unzipped .exe w/ MS AppLocale. Then I manually selected the fonts (even w/o changing them I think they need to be reselected manually? Make sure the running head font is ‘normal’ while others have @) and applied the settings and closed the dialogue, then opened the layout settings successfully, and finally it generated the Layout folder and .lay files. Whew. ;p

From there I was able to mess with the size and placement of all the text, but you still can't switch it to a single page, as far as I can tell.

I also experimented with running the program w/o MS AppLocale, after copying the Layout folder from elsewhere. The dialogue window opened, then the program crashed.
Edited: 2011-04-09, 3:39 pm
Reply
#77
Okay, if you really want just the one page at a time, you can use the first version of TxtMiru (w/ Shift_JIS files): http://www.vector.co.jp/soft/win95/util/se180707.html

Doesn't seem to convert images, but it handles basics like furigana. No page animations, just a single continuous window that you can resize. You can change font size in the normal settings dialogue. Selectable text.

I'm sure theoretically you could, once you got the layout settings in version 2.0 working, arrange the text so that it's like a single page. Maybe. The のど field seems to reduce the middle margin when you change to 0. Anyway, I give up. Now you've got me obsessed even though I don't want it single page.
Edited: 2011-04-09, 9:52 pm
Reply
#78
Using AppLocale and changing all of the default fonts seems to have allowed me to open the layout dialog. However, increasing the font size results in overlapping text and text that flows outside of the white page area and into the gray background area.

I tried version 1, but was put off by its lack of anti-aliased fonts and inability to scroll with my mouse's scroll wheel.
Edited: 2011-04-10, 11:00 am
Reply
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions! - Sign up here
JapanesePod101
#79
Yeah TxtMiru1 is rubbish compared to 2.0. ;p

I just tested TxtMiru2 on my tm2 and I can see how it could get irksome. Especially when I had it in tablet mode and it became very small. Perhaps 3.0... ?
Reply
#80
Wanted to format a pdf with vertical japanese from a novel (.txt), but Aozora produces only wierd stuff instead of kana and kanji in the file. Do I need to open the txt first and convert it into UTF-8? If so, I tried this already and ticked the 'UTF-8' in the application, but same result, only bogus...
Reply
#81
Tori-kun Wrote:Wanted to format a pdf with vertical japanese from a novel (.txt), but Aozora produces only wierd stuff instead of kana and kanji in the file. Do I need to open the txt first and convert it into UTF-8? If so, I tried this already and ticked the 'UTF-8' in the application, but same result, only bogus...
青P seems to only work with UTF-8 inputs. So after conversion, it should have worked fine. I wonder if your file would display correctly with TextMiru2.
Reply
#82
Well, conversion that means with me, opening the txt file in Programmer's notepad and ticking the Encoding->UTF-8 and saving afterwards (!). Though, if I open the file in Programmer's notepad still bogus is displayed, whereas txtmiru does a fine job (unfortunately no 'save'-option is given, darn!) no plan what's wrong o0
Reply
#83
@Tori-kun

Did you tick Encode in UTF-8 or Convert to UTF-8?

At any rate, if you run the .txt through cb4960's tool Aozora Gaiji Replacer, that should convert it. http://forum.koohii.com/showthread.php?p...#pid133661
Reply
#84
You're a god nest0r! Works like a charm!~
Reply
#85
Tori-kun Wrote:You're a god nest0r! Works like a charm!~
A king, not god!

"He became king after Heracles killed Neleus and all of Nestor's siblings." Wikipedia.
Reply
#86
Deeply sorry. Thanks for that precious correction, jettyke-sama!
Reply
#87
...Don't take it seriously
Reply
#88
No take-backs.

By the way, I love that there's a section in Wikipedia's article on Nestor that talks about whether Nestor actually gave good advice. And for the record, the alias is unrelated.
Edited: 2011-04-13, 10:06 am
Reply
#89
jettyke Wrote:...Don't take it seriously
..naturally I didn't o0 *off topic end*
Reply
#90
Slightly off-topic, but this is a great introduction if you want to understand what's going on with all those UTF-8 checkboxes: http://www.joelonsoftware.com/articles/Unicode.html
Reply
#91
I took another look at TxtMiru2.0 and found a way to reduce page size!

There's a trick to it. Here the general procedure:

1) Open TxtMiru2.exe with AppLocale
2) Click ツール(T) | 設定 (S) to open the preference dialog
3) Change the 5 fonts to something other than the default. The first 3 should be set to a vertical text font (font with a @ in front). The next 2 should be set to a normal non-vertical font.

The above steps enable you to open the layout dialog. Now let's open it:

4) Click ツール(T) | レイアウト設定 to open the layout dialog.
5) Change settings. For reducing the page size, the important settings are:

行数 (Important. # of lines per page. I reduced this from 17 to 12.)
字詰 (Somewhat Important. # of characters per line. I lowered this slightly from 40 to 38.)
本文文字サイズ (Somewhat Important. Text size. I increased it from 310 to 370.)
小口 (Very Important. Amount to push right page to the left. I changed this from 1000 to 7000.)
のど (Very Important. Amount to push left page to the left. I changed this from 1000 to 3400.)

There are other setting to adjust the page number position. But those are less important.

6) Press OK on the layout dialog and close the program. This will create a layout folder with a file called Bunko.lay inside.

Here comes the trick:

7) In Bunko.lay, edit the PaperSize field. Original is "21000,14800", I used "14500,14800" to reduce the width of the paper. Note that many of the settings won't take effect unless edited with the layout dialog (for some unknown reason). PaperSize is an exception.

8) Open TxtMiru2.0 again. You will notice the paper size has narrowed.

Here comes another trick:

9) If you want to adjust something other than PaperSize, you must use the layout dialog (Steps 4-6). However, after step 6, the PaperSize will go back to its default value. So to perfect your settings, repeat steps 4 to 9 as many times as it takes.

Contents of my Bunko.lay:
LayoutType,BUNKO 1.0
LayoutName,
PaperSize,14500,14800
PageCharCount,24,38
TextSize,370,20
RubySize,160,20
NombreSize,300,0
NoteSize,20,20
RunningHeadsSize,5,0
Nombre1Format,%1!d!
Nombre2Format,%1!d!
NumbreFormatType,0
TextLayout,13900,500,14000,13800,12,38
TextLayout,7000,500,7100,13800,12,38
NombreLayout,600,150,1500,450,12,38
NombreLayout,13000,150,13900,450,12,38
RunningHeadsLayout,0,0,10500,5,12,38
NoteLayout,0,500,7000,13800,175,665
Edited: 2011-04-21, 11:46 pm
Reply
#92
That is awesome and kind of insane. ;p

I will try setting it up like this for my netvertible, thanks.
Reply
#93
Can I do the following with any of your tools, and I'm just missing it? - Strip all the Aozora formatting from a batch of texts. Purpose: I'm processing collocations in AntConc, but it's messy.

Edit: Although in the meantime, searching clusters by n-gram seems to work better.
Edited: 2011-06-08, 10:00 pm
Reply
#94
nest0r Wrote:Can I do the following with any of your tools, and I'm just missing it? - Strip all the Aozora formatting from a batch of texts. Purpose: I'm processing collocations in AntConc, but it's messy.

Edit: Although in the meantime, searching clusters by n-gram seems to work better.
I whipped up something real quick-like (so try not to break it Smile):

Download AozoraRemover v1.0 (source code included)

Just enter a root directory and it will remove Aozora formatting from all .txt files in that directory (and optionally any subdirectories). The Aozora-less files will be placed in the output directory. Your original files will be unharmed.
Reply
#95
Awesome, thanks! You can't be human. I had to convert the files to UTF-8, though. So much trouble! ;p
Reply
#96
Even better: implement PDF compilation to your Novel Formatter!! AWESOMENESS!
Reply
#97
cb4960 Wrote:
nest0r Wrote:Can I do the following with any of your tools, and I'm just missing it? - Strip all the Aozora formatting from a batch of texts. Purpose: I'm processing collocations in AntConc, but it's messy.

Edit: Although in the meantime, searching clusters by n-gram seems to work better.
I whipped up something real quick-like (so try not to break it Smile):

Download AozoraRemover v1.0 (source code included)

Just enter a root directory and it will remove Aozora formatting from all .txt files in that directory (and optionally any subdirectories). The Aozora-less files will be placed in the output directory. Your original files will be unharmed.
Do you think you could also strip the HTML stuff? Like, I'll get results for ‘clusters’ in AntConc such as this:

chap1">   序

By the way, current settings I'm using for the clusters tab is to search simply for a wildcard (*) with the Words box ticked, min. cluster freq. of 5, min. and max. cluster size 2 and 3. Still playing with settings, especially the leeway I'm giving the ranges and search queries (such as regex) with regards to how AntConc parses terms.

I'm also thinking about how to use the batch query features, as you can import a list and find clusters/collocations involving those items (thus I think generating a list from most frequent words then finding their collocations might work). But I digress.

Edit: At the moment actually, I suppose I'm getting the best results with n-grams in the clusters tab (only problem is it lacks punctuation, etc. tokens in the list, which messes up the concordance search [albeit easily corrected through KWIC sorting a portion of n-gram cluster in concordance tab], and if you enable other token forms in the settings this skews the cluster identification). Bigrams and trigrams (n-gram min. of 2 and max. of 3) are most common in corpus linguistics, so that works for me here. Only problem is that AntConc has a tendency to parse stuff weirdly, but it's not too bad. I wonder if it's me/the program or perhaps how the texts are encoded, will need to experiment further with shift_jis vs. utf-8 n-grams. Edit 2: Hmm, min. n-gram size of 1 seems to work better due to how n-grams are parsed oddly.

As for collocates, I'm getting best results with # (one word) as the search query; using 1L and 1R as the span for this for the time being. Suppose I'll make a new thread on the topic eventually rather than update here. ;p
Edited: 2011-06-09, 4:05 pm
Reply
#98
Hello,

I have just released version 1.1 of Aozora Remover.

Download Aozora Remover v1.1 via Media Fire

Download Aozora Remover v1.1 Source Code via Media Fire

[Image: aozoraremovermain.png]

What changed?

● Added HTML tag removal (thanks nest0r!).
● Added input Encoding option (thanks nest0r!).
● Added option to only process certain extensions.
● Added basic settings file.
● Polished interface a little.

cb4960
Reply
#99
Tori-kun Wrote:Even better: implement PDF compilation to your Novel Formatter!! AWESOMENESS!
See the 青P utility for PDF formatting.
Reply
Thank you! I just tried to process 5000 texts in AntConc. Crashed after it reached ~2 gigs of RAM (out of 8, so I'm assuming it was an AntConc thing). ;p 1000 files worked better, though I'll probably stick with batches of 20-100, sorted by author/genre/readability/whathaveyou. By genre I mean linguistic genre/register as well as literary... There's really so many possibilities.
Edited: 2011-06-09, 9:46 pm
Reply