![]() |
|
cb's Kanji Word Association Tool - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: Learning resources (http://forum.koohii.com/forum-9.html) +--- Thread: cb's Kanji Word Association Tool (/thread-10932.html) Pages:
1
2
|
cb's Kanji Word Association Tool - cb4960 - 2013-06-29 Hello, Kanji Word Association Tool was created for students who want to learn kanji and words at the same time in the most optimal fashion possible. Based on a user-provided list of kanji, this tool will generate a list of words that are associated with each kanji and ensure that each word consists only of kanji that you have already studied up to that point and kana. In addition, words are sorted by frequency and no duplicate words are used. For example, assume that 径 is the 882nd kanji in the user-provided kanji list and that we are using the default options. The words that will be generated for 径 will only contain kanji from the 1st kanji in the list to the 882nd kanji in the list. Download the latest version via SourceForge (source code is available as well) You will need Windows (XP/Vista/7/8) and .Net Framework v3.5 installed. Screenshot: ![]() Code: Options:Have Fun! cb4960 cb's Kanji Word Association Tool - cb4960 - 2013-06-29 Here is the output of the sample kanji lists (using default settings): http://www.mediafire.com/download/zyupo53n0g9hz9y/kwat_sample_output_130915.zip ========================================================= For those who want to generate a custom dict.dat for use with Kanji Word Association Tool, use Kanji Word Association Tool Dictionary Generator. Download the latest version via SourceForge (source code is available as well) You will need Windows (XP/Vista/7/8) and .Net Framework v3.5 installed. Screenshot:
cb's Kanji Word Association Tool - toshiromiballza - 2013-06-30 I wrote a (much simpler) script to do this a year or two ago, but limited it to look for jukugo only. I'm sure this will be much more useful. As always, thank you for your awesome projects! By the way, \Entries_On_Same_Line\output_rtk_lite.tsv, line 135 for example, it found no words. Maybe in such cases exclude the kanji from the output, save the kanji with no words in a separate file, or append them at the top of the outputted text? cb's Kanji Word Association Tool - cb4960 - 2013-06-30 toshiromiballza Wrote:By the way, \Entries_On_Same_Line\output_rtk_lite.tsv, line 135 for example, it found no words. Maybe in such cases exclude the kanji from the output, save the kanji with no words in a separate file, or append them at the top of the outputted text?Thanks for the suggestion. I'll see about adding an option to do that in the next version when "Place all entries on a single line" is selected. cb's Kanji Word Association Tool - ja_min - 2013-07-01 Hello there. I was just passing through near this website and happened to notice this very useful looking tool. Thank you for making it! In the future, do you think there might be more interface options for creating and maintaining multiple dictionaries? Also, perhaps a small area for users to paste in a list of kanji would be useful. If, for example, they were regularly inputting batches of mature kanji into your tool, in order to find associated words. Even better would be an ability to automatically pull mature kanji from Anki, but I'm sure that's a difficult function to be added to some future list of wondrous dream features. Well, thank you for reading! So long! cb's Kanji Word Association Tool - vileru - 2013-07-01 Pair this program with Epwing2Anki to SRS the new vocabulary generated by the word association tool. cb's Kanji Word Association Tool - Thora - 2013-07-01 ja_min Wrote:Hello there. I was just passing through near this website and happened to notice this very useful looking tool.Welcome ja_min. I'm interested in some of the ideas on your site. I'd be grateful if you would message me so that I might be able to contact you in the future to discuss. Cheers. cb's Kanji Word Association Tool - cb4960 - 2013-07-02 ja_min Wrote:In the future, do you think there might be more interface options for creating and maintaining multiple dictionaries?What kind of options did you have in mind? ja_min Wrote:Also, perhaps a small area for users to paste in a list of kanji would be useful. If, for example, they were regularly inputting batches of mature kanji into your tool, in order to find associated words.Perhaps some day. For now though, you'll have to paste these batches at the end of the kanji list input file. ja_min Wrote:Even better would be an ability to automatically pull mature kanji from Anki, but I'm sure that's a difficult function to be added to some future list of wondrous dream features.Probably not going to happen. cb's Kanji Word Association Tool - Aspiring - 2013-07-03 Great site, ja_min. &thanks for this wonderful tool, cb4960. cb's Kanji Word Association Tool - cb4960 - 2013-07-03 I wish I had ja-min's Phase 1 when I was a beginner. Would have cleared up a lot. cb's Kanji Word Association Tool - louischa - 2013-07-04 すごい! You just saved me hundreds of hours of tedious work. I did a first pass through Heisig two years and a half ago, and after a while, I just got bored with Anki, busy with work and so I forgot most of the characters. I also got bored with the Japanese material I was starting to read (slowly and tediously). From what other people have written here (Nuke, in particular), I believe that this situation is not so rare, and so I have no qualms about reporting it. I should add that I have no plan to visit Japan (it's too expensive and I have no time for that), no Japanese friends (why don't you kill me, I'm a loser baby ;-) ), and I barely can speak, so Skyping/chatting is out of the question. Actually I have no pressing reason to learn the language, except a strong love for Japanese culture in general. So it was easy to let language learning slip - c'est la vie! This time, I decided that I was not going to "do" all the characters in one pass. The problem with studying kanji is that you cannot productively learn words as you go, because you need to reach a certain critical repertoire of character to be able to find frequent combinations, unless you are using the Hadamitzky-Spahn order (their book called "Kanji and Kana") - but since I am a big believer in Heisig, I am using his order. So I stopped at 1400 (which provides a comfortable repertoire to fish actual words from); stopping adding characters was actually very hard, since "learning" new characters creates an habit and an inertia. This is good in a sense, but it has its drawbacks. One of them is that passing through Heisig does not qualify as deep learning, just sloppy learning at best - you barely know anything when you just associate a glyph to a meaning. At the end of 2200, you have a huge repertoire of characters for which you have a fuzzy idea of their meaning(s), which you never encoutered in real words. So it is easy for the rarest to recede from memory, unless you undergo a systematic study of vocabulary (you should learn words associated with each character you know, not just rely on a list like the Core2K, which will not cover everything, despite its virtues). Another issue I had when I did the first pass was my horrible recognition time when encountering a character I "knew": clearly recognition has to be much faster, and the less characters your repertoire contains, the easier you can build recognition speed. I plan to spend about 1 year learning systematic combinations of this basic repertoire, based on frequency, hoping that: 1) It will "defuzzify" the meaning of most difficult characters. 2) It will reinforce the on-readings that I know, plus learn the kun. 3) It will speed up my recognition of these 1400 characters to less than, say 1/2 second. 4) Since 1400 characters are easier to manage than 2200, there is a lower risk that I will forget everything if I could stick to it for 1 year. After I am done with these 1400 characters, then I will simply add the remaining characters on the list, but adding 10-20 actual Japanese words for each new character in Heisig as I go - that way, no more fuzzy meanings for characters 1401-2200. Only then will I tackle native material. I'll update about my experience next Summer - I hope that some people may find these musings useful in the meantime. Thanks again to cb for his extraordinary contributions. cb's Kanji Word Association Tool - cb4960 - 2013-07-04 Thanks for the musings louischa. I'm sure beginners will find them useful. cb's Kanji Word Association Tool - ja_min - 2013-07-05 @cb4960 Nevermind! I can't think of any improvements worth considering at the moment, re: dictionaries or whathaveyou. Actually, it could be interesting to get specific types of readings, like have "2 on'yomi, 1 kun'yomi," that sort of thing. Which would end up giving users specific types of words: compound words and those with okurigana. But I suppose that's more difficult than it sounds. Maybe it could be achieved in a roundabout fashion if the tool could associate words that consist of kanji with hiragana, which would give you the kun'yomi words. Thank you for your Text Analysis Tool update. Now if only it factored in parts-of-speech (e.g. content words)... ;p Also, my guide wouldn't have existed when you were starting out, as it relies on tools you made after starting out! @Aspiring - Thanks for your support! I have just put the finishing touches on the completed guide. @Thora - Hi Thora! Unfortunately ja_min is just a humble bot, a solitary, empty mechanical thing plodding along the web promoting itself, incapable of human-level interaction or the bright voice of reason that you occasionally send out into the truculent netherworld. However, simple programs gradually grow more complex, and with the aid of the message option on your profile, perhaps one day ja_min will evolve into a bot capable of email! Take care and thanks for the support, ja_min. cb's Kanji Word Association Tool - Sebastian - 2013-07-05 Great work as always, cb4960! Is there any chance that you or someone else eventually creates something similar, for ordering a list of individual kanji according to their components. That is, basically ordering any list of kanji similarly to the way they are ordered in Remembering the Kanji, so that you only study kanji for which you have already studied all their components. I guess for that you would need detailed info on each kanji's structure, which you can't take from normal dictionary files, but something like the KanjiVG project could be of use in that respect. Quote:KanjiVG is a description of the sinographs (or kanji) used by the Japanese language. For each character, it provides a SVG file that gives the shape, direction and of each of its strokes. This file is also enriched with exhaustive information about the components of the character, the type of stroke employed, etc. See Format for detailed information. cb's Kanji Word Association Tool - cb4960 - 2013-07-05 ja_min Wrote:Actually, it could be interesting to get specific types of readings, like have "2 on'yomi, 1 kun'yomi," that sort of thing. Which would end up giving users specific types of words: compound words and those with okurigana. But I suppose that's more difficult than it sounds. Maybe it could be achieved in a roundabout fashion if the tool could associate words that consist of kanji with hiragana, which would give you the kun'yomi words.Thanks for the suggestion. ja_min Wrote:Thank you for your Text Analysis Tool update. Now if only it factored in parts-of-speech (e.g. content words)... ;pWill probably do this someday. Sebastian Wrote:Is there any chance that you or someone else eventually creates something similar, for ordering a list of individual kanji according to their components. That is, basically ordering any list of kanji similarly to the way they are ordered in Remembering the Kanji, so that you only study kanji for which you have already studied all their components.Thanks for the suggestion. Not sure if I'll ever get around to actually implementing it though. cb's Kanji Word Association Tool - ja_min - 2013-07-06 Perhaps for the dictionary, users could customize the frequency field to reflect words' frequencies from other lists (e.g. generated by your JTAT). This way they can micromanage words they learn based on media they want to engage with. Or an option for using your own dictionary file, made up of custom entries. Or having JTAT generate a glossary file for use by KWAT. (Or EPWING?) In some cases then, a GUI feature might make it easier to manage them than constantly going into KWAT's folder and renaming the .dat file, and perhaps having to continually manipulate the field tokens if the dictionaries have different layouts. For example, let's say someone takes a light novel or subtitles, generates a suite of reports using JTAT, and learns the kanji from the book or video. As those kanji become mature, ideally one would next learn the original words the kanji were taken from, allowing them to engage with the media they're interested in right away. How to do this in a streamlined fashion? Setting aside multiple dictionaries as a feature, using a different frequency field tailored to the original text(s) JTAT generated a report from would give one an idea of what to use and/or to prioritize, and what is extraneous but perhaps useful to expand the learner's horizons. And of course, the dictionary-related options would allow for a truly bespoke process. Just some speculations. Each time you create something, I can't help but get inspired and try to brainstorm improvements right away. You know how I do. ;p cb's Kanji Word Association Tool - cb4960 - 2013-07-07 As I recall, it was one of these brainstorming sessions that inspired the creation of KWAT in the first place, so keep them coming. Though sometimes it can take a while to get from initial concept to implementation. I suppose the least I could do is release a tool that can generate the dict.dat file. It would take EDICT and the output of JTAT as inputs. Second from least would be add a GUI option to select which dict.dat to use. cb's Kanji Word Association Tool - ja_min - 2013-07-08 cb4960 Wrote:As I recall, it was one of these brainstorming sessions that inspired the creation of KWAT in the first place, so keep them coming. Though sometimes it can take a while to get from initial concept to implementation.Cool! cb's Kanji Word Association Tool - ja_min - 2013-07-12 So by taking the JTAT output as inputs, I hope this might also include the kanji frequency report? Not as the source dictionary of course, but for the kanji list that is input into KWAT. At the moment, you have to strip out extraneous information. Perhaps it could even have an option to prioritize by the most frequent kanji in the report, if KWAT could not only process kanji frequency report files without requiring the user to strip them first, but also use the kanji frequencies. I tried to replace the .dat file manually (so that the C10k sentences' unique words would be the source file) but as I should've guessed, it didn't work, so looking forward to future versions. No rush! ;p cb's Kanji Word Association Tool - cb4960 - 2013-07-12 ja_min Wrote:So by taking the JTAT output as inputs, I hope this might also include the kanji frequency report? Not as the source dictionary of course, but for the kanji list that is input into KWAT. At the moment, you have to strip out extraneous information.Currently KWAT assumes that the kanji or word is in the first column, but I'll make this an option in the next release. So for the JTAT kanji report, you would just tell KWAT to use column 2. ja_min Wrote:Perhaps it could even have an option to prioritize by the most frequent kanji in the report, if KWAT could not only process kanji frequency report files without requiring the user to strip them first, but also use the kanji frequencies.So you're suggesting an option to have words be selected based on the frequency of the kanji that make up the word rather than the frequency of the word itself? I'll have to think about that one. ja_min Wrote:I tried to replace the .dat file manually (so that the C10k sentences' unique words would be the source file) but as I should've guessed, it didn't work, so looking forward to future versions. No rush! ;pAhh, the pressure!!! I'll try to release the .dat generator this weekend. cb's Kanji Word Association Tool - ja_min - 2013-07-12 Great! I'll notify my Wall Street clients that KWAT shares will soon skyrocket. cb4960 Wrote:So you're suggesting an option to have words be selected based on the frequency of the kanji that make up the word rather than the frequency of the word itself? I'll have to think about that one.Also, I meant that the order of the resulting list would be by frequency of the kanji, so in that sense word frequency is a secondary sort, but that was just something I tossed in as a bonus, plus I imagine in most cases the order would already be sorted in that fashion (e.g. JTAT report is in frequency order, that order is kept in KWAT list). cb's Kanji Word Association Tool - cb4960 - 2013-07-12 Ah, got it. cb's Kanji Word Association Tool - cb4960 - 2013-07-13 Just added a download for Kanji Word Association Tool Dictionary Generator to the second post in this thread. cb's Kanji Word Association Tool - ja_min - 2013-07-13 cb4960 Wrote:Just added a download for Kanji Word Association Tool Dictionary Generator to the second post in this thread.Sensational! Thanks. ;p I wrote this up if you're curious what I've made of it: http://darkjapanese.tumblr.com/post/55326290518/kwat-g Edit: Another thought. Maybe it's there and I missed it, or maybe it's not included because the original dictionary would've handled every kanji, but perhaps a notification for when the kanji in the list don't have matches in the dictionary? This will be unlikely if kanji learned and used as input are based on the same source the dictionary is generated from, so that a list of C2k kanji will of course have matches in the C2k dict.dat, but when using a smaller, custom dictionary, a lack of matches might happen in various circumstances. So perhaps it could list the kanji that don't have matches so you can use them as input for a new dictionary association, and/or give the option to pull words from another dictionary (once multiple dict.dats are supported). cb's Kanji Word Association Tool - cb4960 - 2013-07-13 I have just uploaded version 2.0 of Kanji Word Association Tool. Download version 2.0 via SourceForge What's New? ● Added column selectors for the kanji list file and known words file. ● Placeholders are no longer generated for kanji that have no associated words when "Place all entries on a single line" is selected. See next point. ● Added the "kwat_kanji_without_associations.txt" output file. Kanji that did not have any entries associated with them will be placed into this file. If all kanji have associations, then this file will not be generated. ● Added "Advanced Options" dialog. Currently the only advanced option is to specify which dictionary file to use. ● Changed "Output file" to "Output directory". ● The main output file is now named "kwat_kanji_words.txt" and cannot be changed by the user. cb4960 |