Back

Kanji compounds/words per Lesson of Book 1 ??

#26
aldebrn Wrote:What did you think of this this list I generated using a list of 2200 kanji from RTK1 and Edict as a corpus: https://gist.github.com/fasiha/42026df0e...ubatsu-txt
It's interesting to see those results. Personally, I'm more aiming for a way to practically test myself by:
(1) inputting a frame number (ie input 150 to include all kanji in frames 1-150)
(2) generate a list using words with those kanji and then
(3) randomly show the English+Kana only and prompt me to write the Kanji.

The Labs does a vocab shuffle just like that except it shows the Kanji and prompts you to provide the meaning and reading. What I'm trying to learn how to do is the other way round (if that makes sense).
Reply
#27
choubatsu Wrote:It's interesting to see those results. Personally, I'm more aiming for a way to practically test myself by:
(1) inputting a frame number (ie input 150 to include all kanji in frames 1-150)
(2) generate a list using words with those kanji and then
(3) randomly show the English+Kana only and prompt me to write the Kanji.

The Labs does a vocab shuffle just like that except it shows the Kanji and prompts you to provide the meaning and reading. What I'm trying to learn how to do is the other way round (if that makes sense).
Very interesting. It does make sense, & sounds like a great idea.

General question: do you *not* want to integrate this with Anki? I.e., you could generate your list up to (2) above, add them to Anki, and learn them via SRS: this is what I thought you originally wanted to do, and the text file I generated could be chopped up to be imported into Anki. But it sounds like you are more interested in non-SRS quizzing, since you already know the words?

Another general question: you're going to hand-write the kanji, right? So the quiz app wouldn't do any grading: you'd tell it "I got it" or "I didn't get it"?

Questions about #3: where would the English come from? Would you be satisfied with a dictionary lookup?

Also something to note is that MeCab's kanji-to-yomi or compound-finding is not 100% accurate. In Nayr's Core5000 deck, I count at least six sentences (out of five thousand) where MeCab processing was corrected by hand. ... Well, maybe it's not worth worrying about ~0.1% failures, but something to be aware of Tongue

What about this idea: if you paste in real text (rather than a dictionary), perhaps the quizzing app could use cloze-deletion? You see the surrounding few sentences, and have to fill in the blank, maybe with MeCab-derived kana as a hint, and no need for English. I can see that as being more engaging and less robotic than kana/Eng=>kanji. What do you think?
Reply
#28
aldebrn Wrote:I generated your list of compounds using this software: https://github.com/fasiha/compounds-per-...er/code.js It's about as stupid an implementation as you can imagine, which is why it's terrifically slow. (I am pretty sure it can be sped up using the technique @lauri_ranta demonstrated...)
That's an interesting list, thanks.

Is it possible to use computer assisted techniques to extract from this list of compounds a sublist consisting of the minimum number of compounds necessary to include all the individual kanji, with the least amount of duplication?

Thanks in advance.
Reply
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions! - Sign up here
JapanesePod101
#29
john555 Wrote:Is it possible to use computer assisted techniques to extract from this list of compounds a sublist consisting of the minimum number of compounds necessary to include all the individual kanji, with the least amount of duplication?
Yes (http://en.m.wikipedia.org/wiki/Set_cover_problem), but how do you feel about one-kanji compounds? Your wording of the problem makes me think you want to omit those.
Edited: 2014-10-31, 7:20 pm
Reply
#30
aldebrn Wrote:
john555 Wrote:Is it possible to use computer assisted techniques to extract from this list of compounds a sublist consisting of the minimum number of compounds necessary to include all the individual kanji, with the least amount of duplication?
Yes (http://en.m.wikipedia.org/wiki/Set_cover_problem), but how do you feel about one-kanji compounds? Your wording of the problem makes me think you want to omit those.
Oh yes, please exclude all "one kanji" compounds. The greater the number of kanji in each compound the better!

So for example, if we were looking at the 2,042 kanji in RTK1 (5th edition) then theoretically it would be nice to have 680 separate three kanji compounds plus one two kanji compound. But I know that's not realistic. So maybe the number of compounds is somewhere between 681 and 1,021.
Edited: 2014-10-31, 7:54 pm
Reply
#31
john555 Wrote:exclude all "one kanji" compounds. The greater the number of kanji in each compound the better!

So for example, if we were looking at the 2,042 kanji in RTK1 (5th edition) then theoretically it would be nice to have 680 separate three kanji compounds plus one two kanji compound. But I know that's not realistic. So maybe the number of compounds is somewhere between 681 and 1,021.
A solution containing 1126 compounds covering 2004 of RTK1v6's 2200 kanji is at https://gist.github.com/fasiha/54b2ff1b3f521d90cd66

Alas, the EDICT dictionary used doesn't contain 78 of RTK1v6's 2200, and another 118 only appear as single non-compounded kanji, hence the non-complete coverage. There may be shorter lists, since I used a simple greedy algorithm, but they'll cover the same number of kanji. (Fun fact: finding the absolute shortest list requires exhaustive combinatorial examination of all 2200^13000 possibilities---cue comparisons to the number of atoms in the universe, or its age in nanoseconds.)

Ideas for improvement: try with JMdict (successor to now-deprecated EDICT, but a bit harder to parse).
Reply
#32
aldebrn Wrote:
john555 Wrote:exclude all "one kanji" compounds. The greater the number of kanji in each compound the better!

So for example, if we were looking at the 2,042 kanji in RTK1 (5th edition) then theoretically it would be nice to have 680 separate three kanji compounds plus one two kanji compound. But I know that's not realistic. So maybe the number of compounds is somewhere between 681 and 1,021.
A solution containing 1126 compounds covering 2004 of RTK1v6's 2200 kanji is at https://gist.github.com/fasiha/54b2ff1b3f521d90cd66

Alas, the EDICT dictionary used doesn't contain 78 of RTK1v6's 2200, and another 118 only appear as single non-compounded kanji, hence the non-complete coverage. There may be shorter lists, since I used a simple greedy algorithm, but they'll cover the same number of kanji. (Fun fact: finding the absolute shortest list requires exhaustive combinatorial examination of all 2200^13000 possibilities---cue comparisons to the number of atoms in the universe, or its age in nanoseconds.)

Ideas for improvement: try with JMdict (successor to now-deprecated EDICT, but a bit harder to parse).
That's great, thanks!
Reply
#33
What's the point? Just for fun to see how small such a list can be?
Reply
#34
yudantaiteki Wrote:What's the point? Just for fun to see how small such a list can be?
I too am curious. Are you going to try and write an epic poem using on py these compounds? If so, maybe a better-ranked list would be better Smile
Reply
#35
choubatsu Wrote:Hi everyone,

I'm wondering if anyone knows of any lists which contain kanji compounds or words listed per Lesson of RTK1? I learned Japanese through a totally different system and my reading ability is advanced. I know all the readings of the kanji and how to read thousands of compounds already. But my writing skills are weak. I am going through RTK1 now, eventhough I know it's not recommended to do it backwards.
If you have a list of vocabulary, you can order it by RTK order by yourself. Just use cb's Kanji Word Association Tool:

Quote:Kanji Word Association Tool was created for students who want to learn kanji and words at the same time in the most optimal fashion possible. Based on a user-provided list of kanji, this tool will generate a list of words that are associated with each kanji and ensure that each word consists only of kanji that you have already studied up to that point and kana. In addition, words are sorted by frequency and no duplicate words are used.
Reply
#36
aldebrn Wrote:General question: do you *not* want to integrate this with Anki?
That's right, I don't need to integrate it into Anki as I know most of the words already. I would hand-write the Kanji and not bother with grading. It's just for the purpose of random testing. But I think the English is needed because there a lot of words which are written the same but have different Kanji. As for where the English would come from, a dictionary lookup would be perfect.

aldebrn Wrote:What about this idea: if you paste in real text (rather than a dictionary), perhaps the quizzing app could use cloze-deletion? You see the surrounding few sentences, and have to fill in the blank, maybe with MeCab-derived kana as a hint, and no need for English. I can see that as being more engaging and less robotic than kana/Eng=>kanji. What do you think?
That's a very clever suggestion too. You mean writing the Kanji based on kana+sentence context, right? Something like:
私は日本語を [ ... ] しています。 (べんきょう) and write the Kanji. Is that what you mean?
Reply
#37
What I'm doing as a temporary solution at the moment is:

Using my main Anki vocabulary deck and creating a Filtered Deck which displays 100 random "production" cards, ie cards which show the English meaning only (no Kana reading). Of course, the main problem is that it is a random process which has no correlation to how many Heisig frames I've covered.

When I finish all the Heisig frames, I suppose I can just use Filtered decks in this way and in theory I should know how to write all the words. I suppose I could also make a clone deck of my main Anki vocab deck and within that make another card type which shows only English+kana reading. And then use that to make Filtered Deck random tests.
Reply
#38
aldebrn Wrote:
yudantaiteki Wrote:What's the point? Just for fun to see how small such a list can be?
I too am curious. Are you going to try and write an epic poem using on py these compounds? If so, maybe a better-ranked list would be better Smile
This list is great...I looked up some of these compounds in the Denshi Jisho online dictionary and so far they look like reasonable (i.e. not too obscure) words. Thanks again.

I plan to fill in the meanings of these compounds and study the list just as a fun additional way of reviewing the RTK1 kanji Any kind of interaction with the kanji is a good thing. I can play around with this list as a change from the usual routine.

Here's my next question...is there a way to automate looking these up in the Denshi Jisho?
Reply
#39
john555 Wrote:This list is great...I looked up some of these compounds in the Denshi Jisho online dictionary and so far they look like reasonable (i.e. not too obscure) words. Thanks again.

I plan to fill in the meanings of these compounds and study the list just as a fun additional way of reviewing the RTK1 kanji Any kind of interaction with the kanji is a good thing. I can play around with this list as a change from the usual routine.

Here's my next question...is there a way to automate looking these up in the Denshi Jisho?
So what you find valuable in the list isn't so much its "minimax" property (attempting to use as few compounds to cover as many kanji as possible), but rather how it's essentially a random list of compounds (that you can improve your vocab with) with kanji repeats somewhat rare (so you can use it to practice RTK). We can make lots of shorter lists that have these latter two features: compounds in no special order (always surprising, like a fancy restaurant) with no kanji repeated, if upon further reflection that's something you're interested in... though I myself vastly prefer to deal with content that comes from real (non-dictionary) sources, so I'd probably try it with a corpus of modern Japanese instead of a dictionary.

As for automation, Jisho apparently also uses Edict (what, are there NO other open J-E dictionaries out there?) so I just searched Edict for each compound to make this file: https://gist.github.com/fasiha/54b2ff1b3...itions-txt it's ugly but it should be easy to chop it into whatever shape is desired.

Wow, some of these are pretty technical/obscure!
Reply
#40
choubatsu Wrote:That's a very clever suggestion too. You mean writing the Kanji based on kana+sentence context, right? Something like:
私は日本語を [ ... ] しています。 (べんきょう) and write the Kanji. Is that what you mean?
This. Give me a few days, I'm in the throes of manual labor building a kanji dependency graph right now, but this is easy to do and will make a good complement to Kanjiwild.
Reply