kanji koohii FORUM
cb's JNovel Formatter - Printable Version

+- kanji koohii FORUM (http://forum.koohii.com)
+-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html)
+--- Forum: Learning resources (http://forum.koohii.com/forum-9.html)
+--- Thread: cb's JNovel Formatter (/thread-7229.html)

Pages: 1 2 3 4 5


cb's JNovel Formatter - nest0r - 2011-06-28

It's still early days yet, but check this out.

You use MorphMan to input a stripped copy of a novel and get a list of its unknowns, right? Hopefully there's a way to use regexp to tally them by punctuation and by line, so you can have sentence and paragraph level totals of the unknowns. Also this kind of information changes as we study, of course, so having it integrated as a tool would be good to periodically update the numbers. (Like I just bugged Overture about adding some stuff to the MorphMan GUI thingy.)

Once you have that information, you can highlight unknown words or otherwise mark them as visual aids, which would be good for homing in with Rikaisan as you read. Perhaps turn them into inline clozes.

But here's the good part that I'm going to experiment with if I can find out how to count unknowns without that per sentence/line iPlusN information currently in MorphMan (if not I can always do something with importing into Anki then re-exporting): if we have totals per sentence and per paragraph, we can turn those into navigational links!

Like you add the number or something to the end of a line, of how many unknowns were in that sentence. Then you click it to take you to the next sentence with that number of unknowns! It would make for a smoother reading experience because you woudn't be excising the intervening lines, they'd be right there for backlooping or simply deciding to push yourself, etc. You could probably add more flexibility to the navigation for going forward and backward to lines and lines with X number of unknowns. Perhaps with keyboard shortcuts? Or hiding and showing sections?

I presume this would also make it easier to turn texts into cards by number of unknowns (rather than purely through Rikaisan or wholesale).

The paragraph level unknowns would be a mid-level navigator and would also give a more incremental knowledge of unknowns per section of the text.

I don't know, just some formatting stuff to think about that JNF might be able to work with. As always, these are just rough speculations that I'm rambling about.


cb's JNovel Formatter - wccrawford - 2011-06-28

So, in case Nestor didn't get anyone's mental juices flowing enough, it occurs to me that using that kind of information you could make an Anki deck with the optimum study pattern of words in a text. Start with a base set of words that you know, then determine which sentences have 1 unknown (i+1) and sort those by difficulty, and maybe other methods.

Then, factor in the new known words and repeat for the next i+1 sentences, sort, repeat.

Sorting might be by difficulty, or grouping into related words, or ... The possibilities are endless.


cb's JNovel Formatter - nest0r - 2011-06-28

You know what else could be good? Showing ルビ for only the first X instances of a word occurrence in a text. (Not sure how often this is built into native materials already.) This isn't necessarily tied to Anki knowledge, though it could be, like it could refuse to show ルビ if you know a word, or if it's known to a certain interval level... Well, perhaps ‘refuse’ is a bit strong. ^_^


cb's JNovel Formatter - jishera - 2011-06-28

This would be very cool if someone could get it to work reliably. I think conjugations might be a bit difficult, though I suppose as long as the kanji is there you could ignore the endings? Not sure how to determine which pieces are conjugations and which are just hiragana words. I'm sure someone has tried something similar before.

Man, I can't believe how many more tools there are now for learning languages compared to when I was in high school. And that was only 6 years ago! Of course...I didn't search around much in high school for resources, but looking at when all my favorite tools and websites were started, it was after I graduated! When I have kids they are going to have so many tools/plugins it will be insane.


cb's JNovel Formatter - nest0r - 2011-06-28

@jishera

I'm not sure what you mean. Have you used MorphMan? It uses Mecab.

By the way, there's also the way Rikaisan performs (and possibly a future implementation of text glossing as discussed recently in the cbJisho thread) for another potential layer of parsing when looking up the words or glossing them.

Edit: I meant to say, it could lead to more refinement, at least when it comes to definitions via edict, etc., as I've noticed for cards with unknowns found via MorphMan and glossing performed using the sentence gloss plugin (which uses WWWJDIC's method that doesn't use a morphological analyzer), the glosses seem more accurate parsings of a given sentence. At least, in terms of dictionary headword integration and relevance of results, which is expected due to its use of the dictionary for segmentation and prioritized dictionaries for results.

And yes, I'm quite surprised, I thought just a year or so ago, we had established the foundation and the rest would be tweaks, but it feels like we've gone light years ahead and the entire paradigm has shifted...


cb's JNovel Formatter - wccrawford - 2011-06-28

Well, when I was in High School, the internet barely existed. We certainly weren't transferring audio files and video around... Heck, downloading a picture required a special program to decode the MIME data into binary.

And practice? Forget it if there wasn't someone local to talk to. Phone calls were too expensive even to the next county, let alone another country.

Phew. I can feel myself turning into an old man already.


cb's JNovel Formatter - nest0r - 2011-06-28

@jishera

Here's an example of the results you'd get from MorphMan's GUI, if you don't have the plugin installed.

Say you save this text from here, to a UTF-8 .txt:

たまにそういうこと、してみたくなるんですよ。正確さとか意味とか内容のあることばかり追い求めていると、あえて意味のない無駄なことをしたくなります。

「面白い」って感覚は人それぞれですけど、ダジャレの脱力感、好きなんですよね。

そこであらためて、テキストにツッコンでダジャレを追加する『addPuns2TextF』をリリースします。

You select ‘extract morphemes’ via the MorphMan button in Anki, select the above .txt, and save it as a .db, such as addpuns2text.db. It'll say ‘Success’, hit OK, then you can browse to that .db and if you open it as database A, hit the A button and it'll list the morphemes, or you can open the known.db (where you collect morphemes you know via your decks), and do A-B to get a list of the morphemes in the addPuns2text.db that are unknown (i.e. they're not in database B, your known.db).

From there you can use that list + regexp to format the unknown morphemes as they occur in the original addpuns2text.txt; here I've added spaces for clarity, though without spaces you can see how Rikaisan would parse them:

たまに そういう こと 、 し て み たく なる ん です よ。 正確 さ とか 意味 とか 内容 の ある こと ばかり 追い求め て いる と、 あえて 意味 の ない 無駄 な こと を し たく なり ます。

「 面白い 」って 感覚 は 人 それぞれ です けど、 ダジャレ の 脱力 感 、 好き な ん です よね。

そこで あらためて 、 テキスト に ツッコン で ダジャレ を 追加 する 『 addPuns 2 TextF 』を リリース し ます。

So imagine if the unknowns only were highlighted, or the knowns were darkened, or you had some kind of hide/show ability, adjusted the ruby markup for furigana, and there were tallies at the ends of each sentence and paragraph you could click on to navigate to the next i+X in some way (I guess a mixture of scrolling and/or highlighting?).

If you paste the original text above you can see the kind of glossing you'd get if we had an offline tool similar to how WWWJDIC works (http://forum.koohii.com/showthread.php?pid=141672#pid141672), as a supplement/alternative to Rikaisan: http://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?9T

Edit: And of course for priming purposes, combining with frequency sorting would work well... the most frequent unknowns in a text, and/or the unknowns in a text that match the most frequent words in another .db/frequency list (academic papers genre, news genre, blog genre, light novel genre, novel genre, etc.).


cb's JNovel Formatter - Nagareboshi - 2012-03-02

cb4960 I've been using your NovelFormatter for quite a while now and it is great!

What I would love to see is a function that allows me to automatically convert a large number of novels. I am thinking of a similar function Aozora Remover has. It let's me choose the input and output directory, my settings, and then starts converting everything in the source folder automatically.

If it is not too much to ask (or too complicated), could you perhaps implement such a function in a future update? :-)


cb's JNovel Formatter - cb4960 - 2012-03-03

Nagareboshi Wrote:cb4960 I've been using your NovelFormatter for quite a while now and it is great!

What I would love to see is a function that allows me to automatically convert a large number of novels. I am thinking of a similar function Aozora Remover has. It let's me choose the input and output directory, my settings, and then starts converting everything in the source folder automatically.

If it is not too much to ask (or too complicated), could you perhaps implement such a function in a future update? :-)
Good idea! Maybe I can get it done tomorrow or next weekend.


cb's JNovel Formatter - cb4960 - 2012-03-06

Hello,

I have just released version 5.0 of JNovel Formatter.

Download JNovel Formatter v5.0 via Google Code

What changed?
● Added ability to recursively process all files contained within a folder and its
subfolders (Thanks Nagareboshi!)

● Added option to split output file based on character length.

● Added CSS placement options:
1) Embed the CSS inside of each HTML file
2) Place the CSS in an external .css file that resides in the individual novel's
output directory.
3) Place the CSS in an external .css file that resides in the topmost output
directory and is accessed by the HTML files via a relative path.
4) Place the CSS in an external .css file that resides in a location of your choice
and is accessed by the HTML files via absolute path.

The last 2 are particularly useful if have formatted a large number of novels as you
only need to edit a single .css file to change styles for all of them.

● Added option to set the ruby color.

● Updated the sample to reflect the specified margin and ruby color.

● Made the index file a bit more compact.

● Fixed crash conditions when parsing the aozora formatting.

cb4960


cb's JNovel Formatter - Nagareboshi - 2012-03-07

Oh my god, awesome! Thank you ever so much! The other options you added are a great addition as well!

I tried it right now and it works, but when I try to select a folder from the GUI, it doesn't let me. Neither can I select all the files in a folder from the GUI. Is it correct that I have to manually type in a folder(s) name(s) for input and choose the one for output? Or add the path to the input folder to the setting file?

The reason I ask is that if this is the way it is intented to work, and given that it works very well the way it is, there is no need to change anything. It might just be worth to mention that one might have to type in the path manually and that the rest works automatically after pushing the convert button.


cb's JNovel Formatter - cb4960 - 2012-03-08

Hello,

I have just released version 6.0 of JNovel Formatter.

Download JNovel Formatter v6.0 via Google Code

What changed?

● Added additional button for selecting an input directory. (Thanks Nagareboshi!).

● Made some optimizations. The processing should now be about twice as fast as version
5.0.

● ※[#歌記号] is now converted to "〽".

● [#改丁] is now removed.

● Added better support to remove aozora indentation tags.

● Added support for aozora image tags that begin with "[#表紙"

cb4960


cb's JNovel Formatter - Nagareboshi - 2012-03-08

You are crazy! :-) Though I appreaciate it there is really no need to thank me for anything ... :-)

This is just a stupid idea, but might be nice to have, that is if you are willing to add it. I'm talking about a text-to-speech function. What I am thinking of here is to use a library that is freely available from Microsoft together with a sample application, name forgotten ..., which allows text-to-speech operation. For JNovel Formatter I am thinking of a function that processes the novel and somehow, magically, puff-the-dragon like, adds voice to sentences and / or single words, or both. Though I can't envision a way to do this without interrupting the use of Rikaisama. So not good if you hover over a word and it speaks it out aloud. You know how it is when you unintentionally hover over an advertisement while having your speakers at max. *shudders* Or maybe text-to-speech processing single word or whole sentences, saved as .mp3, with all the necessary options available such as voice speed, pitch, etc. whatever is available from the library, while processing the novels.

Keep in mind, this is not a request, just some crazy idea that I had in mind. I already have paid software for this and like it. It gives me all the options needed to process material with it, and takes not much time at all. But it might be interesting or nice to have for others who don't own such software, or don't want to - you know click on every word, Rikaisama hoping it is available, and save it automatically to a deck sort of thing. Providing vanilla support for audio without voices that is and further add that to decks or just let them read it out loud by their favorite media player.

So what say you? Would such a function even be possible, or will it remain what it is, just a crazy idea which you aren't going to even consider implementing at all? :-)


cb's JNovel Formatter - cb4960 - 2012-03-08

Nagareboshi Wrote:You are crazy! :-) Though I appreaciate it there is really no need to thank me for anything ... :-)

This is just a stupid idea, but might be nice to have, that is if you are willing to add it. I'm talking about a text-to-speech function. What I am thinking of here is to use a library that is freely available from Microsoft together with a sample application, name forgotten ..., which allows text-to-speech operation. For JNovel Formatter I am thinking of a function that processes the novel and somehow, magically, puff-the-dragon like, adds voice to sentences and / or single words, or both. Though I can't envision a way to do this without interrupting the use of Rikaisama. So not good if you hover over a word and it speaks it out aloud. You know how it is when you unintentionally hover over an advertisement while having your speakers at max. *shudders* Or maybe text-to-speech processing single word or whole sentences, saved as .mp3, with all the necessary options available such as voice speed, pitch, etc. whatever is available from the library, while processing the novels.

Keep in mind, this is not a request, just some crazy idea that I had in mind. I already have paid software for this and like it. It gives me all the options needed to process material with it, and takes not much time at all. But it might be interesting or nice to have for others who don't own such software, or don't want to - you know click on every word, Rikaisama hoping it is available, and save it automatically to a deck sort of thing. Providing vanilla support for audio without voices that is and further add that to decks or just let them read it out loud by their favorite media player.

So what say you? Would such a function even be possible, or will it remain what it is, just a crazy idea which you aren't going to even consider implementing at all? :-)
I think I'll pass for now. There is already an abundance of free TTS software including at least one plugin for Firefox and three for Anki. The only hard part is obtaining a decent quality Japanese TTS voice.


cb's JNovel Formatter - tuliaoth - 2012-04-09

Hi cb, would it be possible to make your JNF output selectable vertical text in HTML?

Purpose: Selectable vertical text would be a useful stepping stone between selectable horizontal text with Rikaichan (i.e. txt novels) and unselectable vertical text without Rikaichan (i.e. scanned/dead-tree novels).

As far as I've read and understood, vertical text in HTML is quite complicated or even downright impossible, but I thought you may be more in the know than I am.

I could only find this for an old version of Firefox:
http://my.opera.com/kailapis/blog/vertical-izer


cb's JNovel Formatter - cb4960 - 2012-04-09

tuliaoth Wrote:Hi cb, would it be possible to make your JNF output selectable vertical text in HTML?

Purpose: Selectable vertical text would be a useful stepping stone between selectable horizontal text with Rikaichan (i.e. txt novels) and unselectable vertical text without Rikaichan (i.e. scanned/dead-tree novels).

As far as I've read and understood, vertical text in HTML is quite complicated or even downright impossible, but I thought you may be more in the know than I am.

I could only find this for an old version of Firefox:
http://my.opera.com/kailapis/blog/vertical-izer
Adding vertical text support would be extremely simple and I don't mind adding an option for it in the next version.

However, the only browser that supports it natively is IE and I don't think any of the others are planning to support it any time soon. I'll have to see how well the "vertical-izer" plugin that you linked to works.


cb's JNovel Formatter - yaya2 - 2015-05-17

nevermind


cb's JNovel Formatter - cb4960 - 2015-08-09

Hello,

I have just released version 7.0 of JNovel Formatter.

Download JNovel Formatter v7.0 via Source Forge

What changed?

● Added Orientation option. This allows you to select either Horizontal or Vertical text orientation. Vertical orientation should work in Chrome, IE, and Firefox 41+. Vertical orientation will not be displayed in the preview window.

● All margins may now be modified independently.

cb4960


cb's JNovel Formatter - stephenmac7 - 2015-10-11

Hey, I saw in the changelog "support for linux" but it doesn't seem to work properly... I'm getting an error that it can't read Gaiji codes and:
Quote:Unhandled Exception:
System.DllNotFoundException: urlmon.dll
at (wrapper managed-to-native) JNovelFormatter.FormMain:CoInternetSetFeatureEnabled (int,int,bool)
at JNovelFormatter.FormMain.disableWebBrowserClickNoise () [0x00000] in <filename unknown>:0
at JNovelFormatter.FormMain..ctor () [0x00000] in <filename unknown>:0
at (wrapper remoting-invoke-with-check) JNovelFormatter.FormMain:.ctor ()
at JNovelFormatter.Program.Main () [0x00000] in <filename unknown>:0
[ERROR] FATAL UNHANDLED EXCEPTION: System.DllNotFoundException: urlmon.dll
at (wrapper managed-to-native) JNovelFormatter.FormMain:CoInternetSetFeatureEnabled (int,int,bool)
at JNovelFormatter.FormMain.disableWebBrowserClickNoise () [0x00000] in <filename unknown>:0
at JNovelFormatter.FormMain..ctor () [0x00000] in <filename unknown>:0
at (wrapper remoting-invoke-with-check) JNovelFormatter.FormMain:.ctor ()
at JNovelFormatter.Program.Main () [0x00000] in <filename unknown>:0



cb's JNovel Formatter - cb4960 - 2015-10-11

Linux is no longer supported. I will remove that line from the changelog to avoid confusion.


cb's JNovel Formatter - Gensan - 2015-11-03

how can i get the images work?

i converted the text then moved all images to output folder, but when i open it, it shows ※[#挿絵画像 01_011]挿絵 instead of image....


RE: cb's JNovel Formatter - ReneSac - 2016-03-11

Another aozora formating that you could support:


Code:
001―▼[#ハート黒、unicode2665]01


It is supposed to look like:

Code:
001 — ♥ 01


In vertical text, the best would be for the whole number (eg. 001) be in one tile/ horizontal line, but I don't know how dificult implementing that would be.