Joined: Apr 2008
Posts: 11
Thanks:
0
Hello,
As I study Japanese words and sentences in a flashcard program, I decided to add a field on the answer side to show Heisig's keywords.
Example:
Question: 最低
Answer:
reading: さいてい, meaning: at least, etc.
Heisig Keywords: [utmost][lower]
I am looking for a script that can generate the "Heisig Keywords" field automatically. So the script can be for a flashcard program, spreadsheet, etc. I already tried to search the Internet but couldn't find any such utility...
Does anyone know of such a ready script?
Thanks in advance.
Joined: Feb 2007
Posts: 28
Thanks:
0
Here's the script. You'll need Python installed on your system. Paste the script into a text file, edit the path for the Heisig data file, make the script executable, and give it a run. I believe I got the Heisig file through the Anki website. I may have downloaded the Heisig Anki database and exported to get the text file. Anyway, if you have trouble with that, e-mail me and I'll send you a copy. It's too big to paste here.
#!/usr/bin/python
# Usage: Search column 3 of inputfile for kanji. Rewrite each line in inputfile to the screen with an additional Heisig field at the end.
# inputfile is assumed to be tab-separated and UTF-8 encoded.
#
# ./addheisig.py inputfile 2 > outputfile # Column numbering starts with 0
#
# Same as above, but format the output with just the keywords, not the kanji.
#
# ./addheisig.py inputfile 2 -k > outputfile
#
import sys, codecs
RTK = 'heisig-data-2.txt' # Change this path/file name for your system
file = codecs.open(sys.argv[1], 'r', encoding='utf-8')
col = sys.argv[2] # Column in the input file where we look for kanji
keywords_only = False
if len(sys.argv) > 3 and sys.argv[3] == '-k':
keywords_only = True
for line in file:
searched = []
heisig = []
parts = line.split('\t')
for ch in parts[int(col)]:
if ord(ch) >= 0x4E00 and ord(ch) <= 0x9FBF and ch not in searched:
searched.append(ch)
rtk = codecs.open(RTK, 'r', encoding='utf-8')
for entry in rtk:
rtkparts = entry.split('\t')
if rtkparts[0] == ch:
if keywords_only:
heisig.append(rtkparts[1])
else:
heisig.append(rtkparts[0] + '-' + rtkparts[1])
break
rtk.close()
print line.strip().encode('utf-8'),
print '\t'.encode('utf-8'),
if len(heisig) != 0:
# Oddly, if these lines are combined into one, the output cannot be redirected to a file.
h = ', '.join(heisig)
print h.encode('utf-8'),
else:
print " ".encode('utf-8'),
print ''
file.close()
sys.exit(0)
Edited: 2009-06-02, 11:09 pm