I need a Heisig/Kanji database for a school project

Index » Learning resources

  • 1
 
Green_Airplane Member
From: Slovakia Registered: 2007-09-21 Posts: 48

Hello,
I've decided to do something similar to this site for my school project. The course is called 'Advanced Database Technologies', so the main focus is on the database-level solution.
What I plan to do is basically fuse this site's functionality with that of Wakan - http://wakan.manga.cz and some features of Anki to get some sort of combined Japanese learning system - Heisig for Kanji + Kanji readings-RTKII or whatever else you wish + a full-fledged Kanji database + extensive Japanese dictionary + adding&reviewing your own vocab.
Probably only a small portion of this will be actually implemented, since I have many more time-demanding courses this semester.
For this project I obviously need a Kanji database in some format I can work with, and of course Mr. Heisig's keyword->character database.
Could I somehow gain access to this site's database? Or do you know some other source where I could get it?
I plan to do this using the Oracle platform, but any format will help as long as I can convert it to something usable. I assure you I do not plan to sell or otherwise distribute the application, or to violate Mr. Heisig's rights, or this site's users' rights in any way.
I have some ideas for improvement, for a more complex implementation of the Heisig system. I'll describe some of them later.
Thank you in advance for your help and support. With luck, this project could evolve into a useful application.

nest0r Member
Registered: 2007-10-19 Posts: 5236 Website

Are you already aware of Fabrice's Trinity project for this site? Try a forum search or various articles in his news archive for more info (here's a starting point: http://kanji.koohii.com/newsdetail.php?id=67)

Bartleby New member
Registered: 2008-06-07 Posts: 7

Seems like you are searching for the KANJIDIC file from this project:
http://www.csse.monash.edu.au/~jwb/edict.html
As far as I can tell the first meaning for each kanji provided in this file is the keyword Heisig uses, with only very few exceptions (for example when Heisig changed his keyword between editions it can be different)

Advertising (register and sign in to hide this)
JapanesePod101 Sponsor
 
ファブリス Administrator
From: Belgium Registered: 2006-06-14 Posts: 4021 Website

Yes, you will find plenty of data files on Jim Breen's site. For the Heisig kanji google up "ziggr heisig" or something like that for an index file. But you can also simply read the Heisig Frame # field from the KANJIDIC data file (they come in CSV or XML formats).

You wouldn't be able to use this site's database effectively as it's been designed for speed and not for completeness. It's got only the data actually used or displayed on the site somewhere, and the indexes and keys are also designed accordingly. However there is a project of a common database format already thought out for you, which accomodates the whole KANJIDIC and EDICT data files, i.e. something like they need to use on WWWJDIC.

Jarvik7 Member
From: 名古屋 Registered: 2007-03-05 Posts: 3946

That would be JMDICT, which is the only current dictionary from Breen. EDICT and KANJIDIC are generated from that on a less frequent schedule. It comes in XML.

Green_Airplane Member
From: Slovakia Registered: 2007-09-21 Posts: 48

I just got into the actual implementation of the project. (so far it's only been lots of pointless documentation, and I've spent most of the time working on other projects, but the deadline is getting closer...)
I am most pleased with edict and kanjidic, I will use them to populate my dictionary and kanji database respectively. The only data I'm missing is stroke count and frequency of use, sadly the kanjidic doesn't contain them. I am prepared to drop them from my project, unless you know of some other source where I could get them.
Since I don't know of any implementations of the RtK2 system (granted it's far less popular than RtK1) I've set implementing RtK2 as one of my project's goals. For this I obviously need a RtK2 database. The RtK2 frame looks something like this:

RtK2 index
the character
compound
character reading
compound reading
compound meaning
cross reference to RtK1

I don't know if such a thing is available or even does exist, I haven't found any yet. Does anyone by any chance happen to know of such database? smile
If there isn't one, I'll just make a sample database manually, with say 20 entries and  say 'see, this is how it's supposed to work'
Lastly, I'd like for each character to have a list of elements it's composed of. Originally this was meant to be consistent with the Heisig element system (I've experimented with this idea couple of years ago when I was making my own flashcard program). But some of the elements are not themselves characters (therefore there is nothing in the charset to represent them), they change meanings, even shapes, according to their position in the character, it gets more and more complicated. I'll probably choose the same approach as with the RtK2 system - make a small sample to demonstrate the functionality. Unless of course you know of some suitable database....
Thank you again, you've been most helpful.

ファブリス Administrator
From: Belgium Registered: 2006-06-14 Posts: 4021 Website

Regarding the decomposition of the characters have a look at the RADKFILE.

Katsuo M.O.D.
From: Tokyo Registered: 2007-02-06 Posts: 887 Website

@Green_Airplane. I've been adding data to my own kanji database for many years. Perhaps some of it may be useful for you. If you let me know a suitable email address I'll send a file.

Green_Airplane Member
From: Slovakia Registered: 2007-09-21 Posts: 48

green
.
airplane
[at]gmail
thank you :-)

Last edited by Green_Airplane (2008 November 25, 3:12 am)

Cheesemaster64 Member
From: USA Registered: 2008-07-21 Posts: 74

I could be mistaken but, I was always told NEVER to put your email address on a forum post due to all the jerks out there who make programs that search for email addresses and spam them. You should edit your post and do something like this:

green.airplane
atgmail.com

Last edited by Cheesemaster64 (2008 November 25, 2:17 am)

Green_Airplane Member
From: Slovakia Registered: 2007-09-21 Posts: 48

yes, I know. spammers and their crawlers. whatever I do, some of them get hold of my email anyway. So I kind of stopped caring...
anyway, I get much more spam from the 'legitimate spammers' than from the others. Like jpod101... don't get me wrong, I love the lot, but their mailing policy keeps me wondering whether I should put them in the ignore list...
Back on topic, the radkfile is mighty useful. All I need now is a RtK2 database...

Last edited by Green_Airplane (2008 November 25, 3:20 am)

ファブリス Administrator
From: Belgium Registered: 2006-06-14 Posts: 4021 Website

Besides please remember that unless the user specificaly opted out in their Profile there should be an "E-mail" link under their username to the left. That lets other members contact you through the PunBB contact form which means that your own email address is not disclosed until the contactee chooses to reply to you.

Green_Airplane Member
From: Slovakia Registered: 2007-09-21 Posts: 48

yes but this way you can't send an attachment smile

Green_Airplane Member
From: Slovakia Registered: 2007-09-21 Posts: 48

I'd like to thank you all again, especially Katsuo. I used Katsuo's database to fill my kanji table, I found it easier to parse than kanjidic. I also used kradfile for kanji elements, and edict for japanese-english dictionary (someone could have warned me that it contains 161 472 entries :-))
my database is now filled with data and all that remains to do is to code some application logic. oh dear...

  • 1