(2016-11-20, 10:05 pm)phil321 Wrote:
(2016-11-20, 8:56 pm)b_j_b Wrote: I made this out of curiosity.
Just putting it out there if it helps or entertains anyone.
Thanks. I took a look at the graph. I guess my question is, how can it be used to help learn kanji compounds?
I'm using RTKs 1-3 and recommend that. I was just curious what the relationships when visualized would look like.
(2016-11-21, 6:42 am)Zarxrax Wrote: How many kanji/words does this contain?
Small has 69 kanji with 45 edges (connecting lines between kanji).
Large has 257 kanji and 211 edges.
See below for details.
(2016-11-21, 12:57 pm)SomeCallMeChris Wrote:
(2016-11-21, 11:18 am)ファブリス Wrote:
b_j_b Wrote:Shows relationships between kanji of the most numerous compounds in the Japanese language.
What are those relationships?
I think what he means to say is that the graph links characters that appear together in many compounds.
There's either a link or there isn't, so it's just showing relationships that are above a certain threshold.
I'm not sure what value there would be to this graph, it's mostly just a random distribution of 2-character terms that are often used as prefixes or suffixes in big words.
The densely connected bits are somewhat more interesting, but I don't really know what meaning to take from that.
I feel like a similar graph for 'most common kanji compounds' would be more interesting, but I don't know how useful that would be either.... although it would be a nice reference for creating or solving crossword puzzles!
Steps taken to generate nodes (kanji) and edges (relationships between kanji):
1. Download JMDict_e.gz from http://www.edrdg.org/jmdict/j_jmdict.html
2. Extract all <keb> entries that only have 常用漢字
3. Add 1 to the relationship's weight between all kanji in each <keb> entry
4. Only retain relationships with weight>=n (n=100 for large graph, n=50 for small graph)
In short, the weights and filtering are based on how many words appear in the dictionary file with each pair (or tuple) of kanji, not based on the frequency of a word in a Japanese corpus.
When I get the time, I'll redo the experiment based on word frequency, probably with data linked from http://ftp.monash.edu.au/pub/nihongo/00INDEX.html
Weighted and filtered by word frequency should be more relevant to people studying for fluency.
Edited: 2016-11-21, 5:36 pm