Back

Community Stories Offline (Microsoft .CHM compiled HTML format)

#1
Greetings, friends! As a UMPC user, I'm often out & about using Anki, on the bus or in other places without Internet, and find myself wishing I could search for kanji stories on Reviewing the Kanji... So I've developed an offline archive of all user-submitted stories, in the form of a compiled HTML module -- behold:

Reviewing the Kanji - 2009-08-16.chm.zip (12MB)
Reviewing the Kanji (+animations) - 2009-08-16.chm.zip
(22MB - includes animated stroke order diagrams of Yamasa Online Kanji Dictionary)

The default precedence of fonts on the main site is:
Hiragino Mincho Pro, ヒラギノ明朝 Pro W3, MS 明朝, MS P明朝

Above these, I've added three brush/handwriting fonts I prefer better for study:
AR PL ZenKai, EPSON 教科書体M, YOzFontEF

So if you have any of those fonts installed, you'll view the characters using them.

Feedback, suggestions, and bug-reports are welcome, but please keep in mind that I am a busy person. From time to time I do plan to release updated versions of this file...

Enjoy! [-:
Doctor Colossus

Viewing .CHM files
.CHM support is built in to Windows, although it is often buggy*. Chmox is a viewer for Mac OS X; here's a good overview of viewers available for Linux.

Windows
* If you're getting the error, "The page cannot be displayed," here are my suggestions:

Quote:▪ First, make sure you've chosen to save the file, rather than 'open' it. Files in "Temporary Internet Files" are treated as content from the "Internet zone", and may thus be blocked (KB902225).
KB896054 describes new security restrictions included in Security Update 840315. See KB883260 for more info. about the "Web content zones" it describes. nba at m:pro provides an easy solution, in the form of downloadable registry scripts.
▪ Another gotcha' is that the file's path may not contain the hash (#) character (KB319247); this probably isn't your problem, but who knows? It's foiled me many a time with C# e-books...
▪ You can use MJ's Help Diagnostics to verify that all the HTML Help runtime DLLs are installed and registered correctly, and to install them if not.
▪ The file "%userprofile%\Application Data\Microsoft\HTML Help\hh.dat" might be corrupt. This is where .CHM preferences & favorites are stored. You can safely delete it and it'll be regenerated.
▪ If your "Temporary Internet Files" directory is full, it can cause bugginess with .CHM files.
Edited: 2011-03-21, 11:15 am
Reply
#2
For up-to-date stories, any user can just use this: http://forum.koohii.com/showthread.php?tid=3104
Reply
#3
I think that only captures your own stories, while Dr. Colossus has captured all public stories. Important difference.
Reply
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions! - Sign up here
JapanesePod101
#4
ruiner Wrote:For up-to-date stories, any user can just use this: http://forum.koohii.com/showthread.php?tid=3104
Sweet!
Reply
#5
woelpad Wrote:I think that only captures your own stories, while Dr. Colossus has captured all public stories. Important difference.
Ah, I see. I can see how that might be useful for kanji you first learn offline or something.... not sure if Fabrice will be happy about this, though. ;p
Reply
#6
I have great respect for Fabrice and cleared the idea with him before posting.

fuaburisu-san wrote:
Quote:Good idea!

I assume that every contributor keeps the rights on their own stories, but all in all this is really a group effort. So I don't think anybody will complain about their stories being collected for offline use Smile

It would be nice to add credits to the RevTK Comunity as a collective for the authoring of the stories, and include a link to the RevTK website http://kanji.koohii.com.

Other than that, you're welcome!
As his web-site testifies, he's a generous individual hoping to help people around the world to learn the kanji; and he doesn't seem the type to be jealous about hording the information his site has helped gather. I agree with him to make sure people are directed to the online site in case they're unaware of it, so that our community can continue to grow and flourish. And such credits are given on the initial page of the file, as you'll find.

This is intended as a tool for supplementing the online site, for those like me who don't have constant Internet access. I hope that it will be useful for some of you, and I look forward to your feedback! [-:
Reply
#7
Cool. :) I only mentioned it because I had some vague notion it'd been discussed a cpl years ago in regards to exporting/archiving stories. I know I spent twice as much time doing RTK purely so that I could make/share stories, so I don't mind (with obvious limitations, but you're 'doctorcolossus', not 'Dr. No', so I'm not worried.)
Edited: 2009-09-02, 8:41 pm
Reply
#8
doctorcolossus Wrote:So I've developed an offline archive of all user-submitted stories, in the form of a compiled HTML module -- behold:
Reviewing the Kanji (2009-08-16).chm (12MB)
This is pretty sweet!

CHMs from the interwebs can be a pain in the arse, but one way I found of getting around the 'from the net' security issue was simply to share it in a zip/rar, that way, when people download it and extract it, it appears to have come from a local source. At least this was the case a few months ago when I made the Tae Kim CHM....
Reply
#9
doctorcolossus, you did an excellent job!

Thank you very much.
Reply
#10
wonderful idea!
I never thought was possible
thanks
Reply
#11
doctorcolossus Wrote:I've developed an offline archive of all user-submitted stories, in the form of a compiled HTML module
I just mirror the stories site as html: http://ichi2.net/anki/wiki/ContribFugounashi#revtkrip

I guess you could build a squashfs to save space but I've never bothered...
Reply
#12
I'd like a copy of the stories as a fully indexed and linked EPWING dictionary. Anyone tried to build one? One day I might get around to it...
Reply
#13
cangy Wrote:I'd like a copy of the stories as a fully indexed and linked EPWING dictionary. Anyone tried to build one? One day I might get around to it...
Yeah... in this way it can be used even with the iphone
Reply
#14
mistamark Wrote:This is pretty sweet!
Thanks!

mistamark Wrote:one way I found of getting around the 'from the net' security issue was simply to share it in a zip/rar
That's a clever solution. I've updated the link to point to a .ZIP file. I hope that fixes the problems... Thanks. [-:

cangy Wrote:I just mirror the stories site as html: http://ichi2.net/anki/wiki/ContribFugounashi#revtkrip
Thanks for your mirroring script. That code might be helpful if I ever completely automate the process of generating this file. You might want to take a look at the file I made if you haven't -- it's a little better than a straight HTML mirror. I made a script which removes unnecessary elements from the HTML pages, and I pared down all of the Javascript and CSS to the minimum. Another advantage to the .CHM format is the table of contents, available in a sidebar.

My process consists of 1) mirroring kanji.koohii.com/study/ using WinHTTrack, a nice program for the purpose in Windows; 2) running the downloaded pages through my script, which renames them simply and appropriately (to 1.html, etc), fixes all hyperlinks, removes unneeded stuff, and improves presentation; 3) compiling these into .CHM using a one-off script-generated table of contents.

cangy Wrote:I guess you could build a squashfs to save space but I've never bothered...
If you'd be interested in my PHP script (mostly regexes) for paring down the HTML pages, or the output (~80MB) in order to compress it using squashfs, I'd provide it to you. Or you could obtain the output by decompiling the .CHM file. You could do it with hh.exe, included with Windows and probably available on any Windows installation disc, in WINE. I think there are several other freely-available programs for doing this too.

cangy Wrote:I'd like a copy of the stories as a fully indexed and linked EPWING dictionary. Anyone tried to build one? One day I might get around to it...
I didn't know about EPWING. It looks very cool, but I don't know the format yet. I might be interested in helping with that project. I have a pretty good handle on extracting data from these pages now, from generating the ToC. Does EPWING support HTML entries? I liked being able to preserve the aesthetic of the online site in my .CHM and I'd like to do that also in EPWING format to the extent possible.

cescoz Wrote:wonderful idea!
I never thought was possible
thanks
You're welcome! [-: I'm glad you like it... As for the iPhone, I just turned these results up with a Google search for "iphone chm":
- iChm for iPhone and iPod Touch: iChm is an ebook reader for CHM (Microsoft Compiled HTML Help) files.
- MobileCHM: CHM & LIT Reader

I have no idea whether those programs will work very well, but I would be curious to hear how they do if you try them!
Edited: 2009-09-05, 7:53 pm
Reply
#15
By the way, I just had a good idea about further improving this .CHM by including all animated stroke order diagrams from Yamasa Online Kanji Dictionary (example). I just e-mailed them to request permission. I'm not sure if they'll say yes; it seems like it might be a bit of a longshot, but I hope they do! [-:
Reply
#16
If Yamasa doesn't come through, could you use the animated stroke order diagrams (SODA) from Jim Breen's kanjidic site (wwwjdic)? (I don't know the source or whether they're available - just a thought)
Reply
#17
Yes, I think so -- here's the license. That's a great suggestion; thanks! I do prefer Yamasa's brush-like animations though, so I hope they say yes! We'll see...
Reply
#18
doctorcolossus Wrote:I didn't know about EPWING. It looks very cool, but I don't know the format yet. I might be interested in helping with that project. I have a pretty good handle on extracting data from these pages now, from generating the ToC. Does EPWING support HTML entries? I liked being able to preserve the aesthetic of the online site in my .CHM and I'd like to do that also in EPWING format to the extent possible.
some markup is supported. there are some utilities here: http://www.hloeffler.info/epwing/epwing_...html#tools
Reply
#19
Hattori Foundation - The Yamasa Institute Wrote:Yes provided you include our copyright and a hyperlink to http://www.yamasa.org beside each image I think this would be a useful resource for language learners. Please consider this email an authorization. Good luck with your project.
[-:

Thanks, Yamasa!!!

So I've released a new version including all stroke order diagrams available from Yamasa's dictionary (about two-thirds of all the 3,007 characters included in Remembering/Reviewing the Kanji) -- see my initial post above, which I edited to include the new link. The new file including all of these animations is 10MB larger, so I'll leave the lighter-weight version available too.

It took me a couple of days to conjure up a workaround to an Internet Explorer bug in displaying animated GIFs which started out as hidden. ::rolls eyes:: I'll be curious to hear about whether or not it works now in CHM viewers other than Windows' standard built-in one. Clicking on a kanji ought to display a pop-up image of the stroke-order animation. Clicking the animation ought to close the pop-up again.

I also added some info. about font precedence to the initial post.

@cangy: I feel really satisfied having gotten this newest version of the CHM completed. And I'm all for attempting a port to EPWING, but I'm putting this project down for the moment and it will probably be several months before I devote more energy to it. I'm going to Cracow, Poland next month for the CELTA training program at International House. Of course I have a lot of loose ends to tie up before leaving, and I also promised to deliver on the CMS I've been writing for the Free School Denver web-site -- another pet project -- by sometime next week! Etc. [-; I've bookmarked your link to conversion software and format specs; I'll be back!
Edited: 2009-09-11, 4:39 pm
Reply
#20
Updated download links, which had been broken a few months. Compiled stories however are still only current as of late 2009. d-:

Due to changes to the site's format, I'll need to revise my script a bit before I can post a new compiled rip. It probably won't be as big of a deal as I think, but I don't have time for messing around with it yet -- I hope to do it this summer. [-: Cheers.
Edited: 2011-03-21, 11:26 am
Reply
#21
Please note the recently added license page, it is linked in the footer and the About page.
Reply