![]() |
|
cb's Kanji Word Association Tool - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: Learning resources (http://forum.koohii.com/forum-9.html) +--- Thread: cb's Kanji Word Association Tool (/thread-10932.html) Pages:
1
2
|
cb's Kanji Word Association Tool - ja_min - 2013-07-13 You're a wonderful person. cb's Kanji Word Association Tool - ja_min - 2013-07-16 I've run into an issue w/ KWAT's advanced options dictionary selector. It seems that once I run an analysis with a dict.dat, it will only use that dict.dat for subsequent analyses until the program is closed and run again. So if you run it with one dict.dat, then switch to another w/ the options, it won't use the new one till you restart. But if you just open KWAT and switch to another dict.dat without running it w/ the one you started the program with, it'll use the new one without requiring a restart. cb's Kanji Word Association Tool - ja_min - 2013-07-16 Also, using the original dict.dat (for either version), KWAT for some reason can't seem to find associations for these kanji: http://pastebin.com/qKU2V4Af So for example, it can't seem to find this entry, for some reason: http://jisho.org/words?jap=%E9%83%8A%E9%87%8E&eng=&dict=edict - Even though the second kanji is used in multiple words w/ other kanji from the list. (The first kanji from the word is in the "no associations" list.) By the way, the "maximum number of entries" doesn't seem to be honored, either. ;p cb's Kanji Word Association Tool - cb4960 - 2013-07-16 Thanks for testing. I'll look into those issues soon. cb's Kanji Word Association Tool - ja_min - 2013-07-18 cb4960 Wrote:Thanks for testing. I'll look into those issues soon.Cool. Take your time, I learned my lesson and stopped making promises to those dangerous people. ^_- In the meantime, here's added incentive to switch to Anki 2! Well, not really, unless you're doing o+1 cards. ;p Thanks again for your help w/ the Sentence Gloss Shuffle add-on: http://darkjapanese.tumblr.com/post/55759450816/ja-resources-shared-at-ankiweb cb's Kanji Word Association Tool - masshemorrhoids - 2013-07-18 Thanks for the program, cb4960. This is looks to be quite helpful. How are you guys adding the kanji you want to generate vocab from anki? Do you just copy them individually into a separate file for kwat or is there some automated method? I've tried exporting a list from anki but it's not formatted the way KWAT requires with 1 kanji per line. cb's Kanji Word Association Tool - cb4960 - 2013-07-18 ja_min Wrote:Also, using the original dict.dat (for either version), KWAT for some reason can't seem to find associations for these kanji: http://pastebin.com/qKU2V4AfI can't seem to reproduce this. I used the first item in your link, 輸, as a test case. Can you provide the lists that you used and the steps you took? ja_min Wrote:So for example, it can't seem to find this entry, for some reason: http://jisho.org/words?jap=%E9%83%8A%E9%87%8E&eng=&dict=edict - Even though the second kanji is used in multiple words w/ other kanji from the list. (The first kanji from the word is in the "no associations" list.)I also can't reproduce this. 郊野 is rather infrequent (it's only the 152,936th most frequent word), so other entries are probably taking priority. ja_min Wrote:By the way, the "maximum number of entries" doesn't seem to be honored, either. ;pCan't reproduce this one either, but keep in mind that it is not "Maximum number of entries", but rather "Maximum number of entries to associate with each kanji". So if you set it to 3, then kwat will only generate 3 entries for any given kanji in your list of kanji. And if you have 100 kanji in your list, then a maximum of 300 entries will be output to "kwat_kanji_words.tsv". cb's Kanji Word Association Tool - cb4960 - 2013-07-18 masshemorrhoids Wrote:I've tried exporting a list from anki but it's not formatted the way KWAT requires with 1 kanji per line.How is the list that is exported from Anki formatted? cb's Kanji Word Association Tool - ja_min - 2013-07-19 As for max entries, I suspect that you or the program means it differently than I interpreted. I assumed that it would only show one word per kanji, period, but when I set it to 1 or 7, for example, I'll get at least 3 and as many as 36 words containing the same kanji, so I'm thinking maybe that the program will use that kanji in other words if those words are, say, more common, it just won't conceptually be the primary kanji for those words? Interestingly the kanji in the above paste that are in the RTK Lite list have associations when I use the latter. If you still have that sorted spreadsheet for the 1945 kanji, I used that (here: http://darkjapanese.tumblr.com/post/40510633175/bushu). Edit 2: Damn, no wait, that's not what I used. ;p I used kanji I extracted from something or other. Yeah... well, I'll try and duplicate it with another list or something. After I sleep. In the meantime: Input file: http://www.putlocker.com/file/FC4E18C4C6D62079 Associations: http://www.putlocker.com/file/EC285C202DC3FCB8 Kanji w/o associations: http://www.putlocker.com/file/06D5536FF5289D18 Max entries set to 5, single kanji words disabled. Does that seem right? It seems to me that some of those kanji have "common word" compounds showing up via online dictionaries and should therefore have at least one association. Even if they were prevented from having an association because the other kanji in the compound is used to the max (doesn't seem fair, does it? ;p), in fact some of the kanji, at least, are used more than the max entries regardless. Maybe the max setting is only being applied to these kanji, for some reason? Because allowing single kanji words does shorten the no-associations list considerably. Edit 3: OK, check this out: http://www.putlocker.com/file/B52C999A138FFC58 With either of those files created from the test.txt, you'll notice that the single kanji in the w/o associations list does have associations. Note that multiple words use the same kanji when max entries is set to 1, so I'm thinking it is related to the above difference of interpretations. Edit 4: Gah! I compared the 1945 test files and it seems that the no-assoc file has been misleading me by saying certain kanji don't have associations, when in fact they have multiple associations, as first and/or second constituents (C1 or C2, for first or second kanji) in compounds. I swear I checked for that very discrepancy to begin with, before ever posting on the topic, but maybe I dreamt it during my recent flurry of updates. cb's Kanji Word Association Tool - cb4960 - 2013-07-19 I have just uploaded version 2.1 of Kanji Word Association Tool. Download version 2.1 via SourceForge What's New? ● Added advanced option: "Allow associations to entries consisting of any kanji (not just those already encountered or those in the user's kanji list)". ● Added advanced option: "Output a list of kanji in the user's kanji list that were not used by any of the associated entries". ● Fixed bug that prevented entries without frequency from being used for associations. ● Fixed bug where the selection of the dictionary file (in Advanced Options) could only be performed once. cb4960 cb's Kanji Word Association Tool - cb4960 - 2013-07-19 @ja_min, I think there might be some confusion regarding the default function of this tool. For each kanji, KWAT will only associate words that contain kanji that have already been encountered in the kanji input list. For example, let's use this small list of 3 kanji as input: 造 建 設 When looking up associations for 建, KWAT will only associate words that consist of some combination of 造, 建, and kana. So, 建造 would be a valid association because it consists of 造 and 建. 建つ would also be a valid association because it consists of 建 and the kana つ. However, 建設 would not be a valid association for 建 because 設 hasn't been encountered yet. That said, when looking up associations for the next kanji, 設, 建設 is a valid association because now both 建 and 設 have been encountered. By using this approach, the learner only sees entries based on the kanji that they have already learned, assuming that they are learning in the order given in the input list. However, since I love options, you can override this behavior: 1) If you want to allow associations to entries consisting of any kanji in the user's kanji list instead of those already encountered, set the Look-ahead range option to to Dynamic and set R = 99999. 2) If you want to allow associations to entries consisting of any kanji (not just those already encountered or those in the user's kanji list), then enter the advanced options and check this option. cb's Kanji Word Association Tool - ja_min - 2013-07-20 ![]() Thanks for the tip. ;p cb's Kanji Word Association Tool - masshemorrhoids - 2013-07-20 cb4960 Wrote:How is the list that is exported from Anki formatted?Here's 1 line from exported filed opened in notepad: 1 一 one 1 1 一 2 二 two 2 1 二 Here's what it looks like opened in excel: 1 筝 one 1 1 筝 2 篋 two 2 1 篋 3 筝 three 3 1 筝 4 ・ four 5 1 ・ 5 篋 five 4 1 篋 It's pretty confusing, so right now, I'm modifying copies of your rtk sample list. This works good enough. Thanks for everything. This is a great tool! cb's Kanji Word Association Tool - cb4960 - 2013-07-20 masshemorrhoids Wrote:If each field in the exported file is separated by a tab, then KWAT shouldn't have a problem with it. Just make sure that the column option is set to 1.cb4960 Wrote:How is the list that is exported from Anki formatted?Here's 1 line from exported filed opened in notepad: cb's Kanji Word Association Tool - ja_min - 2013-07-22 Some very speculative speculation for you to speculate on regarding speculative future endeavors... brace yourself: vkSAT - cb's (Vocabulary/Kanji) Sentence Association Tool: Take a source such as the Core 2000 sentences. Extract the words... per sentence. Extract the kanji... per word per sentence. Learn the total unique kanji that are associated only with the total unique words in a given sentence, for a set of X sentences. The sentences would be ordered so that each successive sentence contains the minimum number of novel unique words beyond the total unique words of the previous sentences. Perhaps the initial sentence would be the shortest. Assuming steps 1: kanji, 2: vocabulary, 3: sentences, then step 1 would lead to step 3 with minimal deviation and would lead to the most gain from the least investment. At any rate, it's a brainstorm-in-progress... I've just put the finishing touches on a similar but less precise method. To put it another way, the aim of the early stages of ja-minimal is to learn the least number of kanji in any given unit of study time that will boost learning of the most words that would lead to the greatest number of utterances with entirely mature vocabulary which are used for spaced listening comprehension tasks. Where utterance sources are selected for transferability outside Anki and balanced between function, relevance, and engagement. In later sections the same principle is applied to output, minus the kanji step (see ja-minimal sections I.B.1, I.B.2, and I.C. and later II.A.2.-II.B.2.). cb's Kanji Word Association Tool - cb4960 - 2013-07-23 Interesting. Might take me a while to grok of all of that. cb's Kanji Word Association Tool - ja_min - 2013-07-23 Before you can grok it, you have to grep it, with the regular expression of your heart. cb's Kanji Word Association Tool - cb4960 - 2013-10-27 I have just uploaded version 2.2 of Kanji Word Association Tool. Download version 2.2 via SourceForge What's New? ● Updated dict.dat (EDICT/Frequency database). cb4960 cb's Kanji Word Association Tool - aldebrn - 2014-11-14 ja_min Wrote:Some very speculative speculation for you to speculate on regarding speculative future endeavors... brace yourself:@ja_min, I'm not 100% sure but I think this is brilliant. Are you still here so I can pick your brain about this? And about the minimal method in general? |