There is a Chrome and Firefox extension called Characterizer that replaces RTK keywords with kanji on websites. I wrote a simple Ruby script that does the same thing for plain text files:
require "open-uri"
input = IO.read("input.txt")
open("http://jptxt.net/rtk-keywords.txt").read.split("\n").grep(/^[^#]/)[0..2199].each { |k|
kanji, keyword = k.split(";").values_at(0, 2)
input.gsub!(/(\b|[“”‘’…—])#{Regexp.escape(keyword)}(\b|[“”‘’…—])/i, "\\1#{kanji}\\2")
}
puts input
The output looks like this:
Jofuku was the Wise 男 之 China. 多 books 彼 読, and 彼 never
forgot 何 was 中 them. 皆 the characters 彼 knew as 彼 knew the lines
中 the palm 之 his 手. 彼 learned secrets 乃 birds and beasts, and
herbs and flowers and trees, and rocks and metals. 彼 knew magic and
poetry and 哲. 彼 grew 満 之 years and 智. 皆 the 民
honoured him; but 彼 was 勿 happy, for 彼 had a 語 written upon his
心.
If you use OS X, you can run the script by for example saving it as ~/script.rb, saving the input text as ~/input.txt, and then running "ruby script.rb" in Terminal.
I uploaded a sample output text to http://jptxt.net/japanese-fairy-tales.txt.
require "open-uri"
input = IO.read("input.txt")
open("http://jptxt.net/rtk-keywords.txt").read.split("\n").grep(/^[^#]/)[0..2199].each { |k|
kanji, keyword = k.split(";").values_at(0, 2)
input.gsub!(/(\b|[“”‘’…—])#{Regexp.escape(keyword)}(\b|[“”‘’…—])/i, "\\1#{kanji}\\2")
}
puts input
The output looks like this:
Jofuku was the Wise 男 之 China. 多 books 彼 読, and 彼 never
forgot 何 was 中 them. 皆 the characters 彼 knew as 彼 knew the lines
中 the palm 之 his 手. 彼 learned secrets 乃 birds and beasts, and
herbs and flowers and trees, and rocks and metals. 彼 knew magic and
poetry and 哲. 彼 grew 満 之 years and 智. 皆 the 民
honoured him; but 彼 was 勿 happy, for 彼 had a 語 written upon his
心.
If you use OS X, you can run the script by for example saving it as ~/script.rb, saving the input text as ~/input.txt, and then running "ruby script.rb" in Terminal.
I uploaded a sample output text to http://jptxt.net/japanese-fairy-tales.txt.
Edited: 2013-12-13, 9:23 am
