rrrrrray Wrote:@Savii - If you are using AozoraProvider in Android, you can now open the text from it with Jade Reader. It should be able to read both its TEXT or XHTML format. Let me know if that's not what you want
I was referring to the special markup codes in the Aozora format; they're a bit more than just text files in SJIS. Especially prominent in my totally innocent light novels are images (which would be a luxury but very nice, even if it's just an inline link/button triggering an image popup or a call to an external image viewer app or something) and furigana (which actually poses a problem when reading, because the parser won't 'see' the entire word in some cases, e.g. inflectables).
As far as I know furigana is always delimited by 《 and 》 characters, examples:
ここら辺、線引きが曖昧《あいまい》で踏み違えやすい。努々《ゆめゆめ》注意されたし。
思春期も資本社会の中でしか育《はぐく》まれない、ということだ。
I can imagine actually neatly displaying the furigana neatly above the line would be a challenge. Also, in LNs "furigana play" is not unusual, meaning the furigana
itself could very well need lookups. I think an easy alternative would be to have the parser skip over 《》 sections so the verb in the second sentence will be properly regognized as 育む. Fading/dimming the text color of furigana sections a bit would be a nice bonus.
Images can be included like this:
[#表紙(relative/path/to/image.jpg)]
PNGs are used as well. Note that the special characters are fullwidth except for the image path. [#foo] in general seems to indicate an Aozora markup command, I've seen some others as well, for example to control font types and page breaks, so they can be stripped from the reader output.
This is not the only format though, HTML style is also common:
<img src="relative/path/to/image.jpg">
If you need more detailed info: there seems to be some sort of
spec in Japanese on the Aozora Bunko website, though it looks like it's aimed at authors rather than programmers. Perhaps
this git repository with the Java source of an Aozora interpreter is more useful.