Back

Tracking/Analyzing Japanese Knowledge

#26
dennybritz Wrote:For the grammar I see no other way than manually inputting data into the system and then matching these entries against a given text. I would need to build a database similar to jgram.org. Unfortunately jgram.org does not store enough information for each grammar entry in order to be useful for text analysis.
You're right, I guess some sort of pattern matching would be the easiest way to go about it.

dennybritz Wrote:There is a similar problem with vocabulary. Right now I am using a dictionary to classify/extract vocabulary, which I originally thought was a good idea. However I realized that there are quite a few problems with this approach. Most of the dictionary entries are ambiguous or contain too much information to be useful (too many kanji/kana readings many of which are basically never used). Also, there are probably a lot of phrases which are not included in a dictionary. Also, words often contain several very distinct meanings (and readings) but they are all associated with one dictionary entry. Now I think that vocabulary (and the corresponding flashcards) should be created manually, but it must be in a unified way (i.e. there shouldn't be two different entries for the same vocabulary word/meaning).
One method is to try to match a word against a dictionary, and if it's not found, handle it the dumb way (ie, just use general string equality and lose out on matching conjugated forms, etc) or even just the dictionary's common words (eg, stuff with "(P)"). That might be a useful compromise, or at least useful in the meantime until you get manual data.

dennybritz Wrote:I recently thought of a system like the following:
Everyone can add, edit, remove or classify flashcards (adding fields as necessary), which are linked to dictionary entries for more information. Basically, this would yield a large collection of *unified* flashcards for kanji, vocabulary, grammar and sentences, tagged and categorized for different decks (such as jlpt or kanken level). These flashcards are then used to make up your own decks (using only the fields you want), reviews, and analyze texts. So, basically it would be a community effort to put all data about Japanese into reviewable and analyzable flashcard form Wink Based on that, personal suggestions and text analysis would be much easier.
A centralized fact database to construct decks from would certainly be interesting.
Reply