Back

Tracking/Analyzing Japanese Knowledge

#1
Hello,

Like many people here I have tried a lot of different things to help me make my Japanese studies more expensive. I can't even imagine what I would be like without Anki ;) However, one thing that always bothered me is that we are lacking integrated solutions combining all the things we are using. Anki, RevTK, subs2srs, smart.fm, readthekanji.com, mobile apps, lang-8, jpod101, other websites, etc. There is so much useful stuff out there but all of these are independent of each other and serve a specific purpose (and they do it well).

One problem with Anki is that it's a general-purpose SRS. It doesn't know about your Japanese knowledge. You can add a new Vocabulary Item to Anki but already have it in 3 other decks or 20 sentences. Anki won't know. Anki doesn't even know if you are currently reviewing Kanji, Vocabulary, Sentences, or something else. Of course, it's not the goal of Anki to do that and I am not saying that it should be. But wouldn't it be nice if you actually knew how much you know? If you could tell exactly how much more vocabulary or grammar you actually need to learn in order to pass the JLPT N2 (or whatever test)? If you want to read that news article (or watch a drama episode) and you could exactly tell how many words you already know and what is missing in order to understand it? Or if you could get recommendations on what to read, watch or study next, based on your current knowledge?

My goal is to create such a website/tool. It's main goal will be to keep track of a learner's knowledge in 4 areas: Kanji, Vocabulary, Grammar, Sentences. Not simply if you know something or not, but also *how well* you know it, i.e. how well you perform on reviews of certain items. Based on the learner's knowledge I would then create highly personalized content. For example, I could find news articles or TV shows which use/cover the majority of a user's knowledge. I can find the perfect i+1 sentences to practice and learn something at the same time. Or I could tell exactly how/where the learner need to improve in order to, for example, pass the JLPT N2. And maybe most importantly, it would go together with whatever primary method of study you are using. If you are taking a class or reading a textbook or learning from RTK, you simply add the knowledge to the system and everything else is done for you.

I have been working on it for a few weeks now and the functionality is still rather limited. You are now able to track your Kanji and Vocabulary Knowledge and review it using a simple SRS. The next steps will be adding Sentences and Grammar. Then I would try to include native media such as books, news and videos, personalized towards each users knowledge.
You can get an overview of the current features here: http://www.languagebundle.com/tour

I am currently debating if I should continue with the project or not. That's my reason for posting here. Provided it will be free to use, would you use such a system? I want to hear any feedback or input you have :)

Thank you!
Edited: 2010-10-23, 5:45 pm
Reply
#2
Reminds me of: http://www.ted.com/talks/gary_wolf_the_q..._self.html
Reply
#3
I can certainly see some benefits of what you are trying to achieve. However, nothing is going to replace Anki for me, so I would love to have a plugin that could export stats from my Anki decks somehow.

One thing I would love, is to toss a script of something at something like your website, say an entire drama series, then it would go and pick out the most common words and suggest to me to learn these. In an order that would be most beneficial of course. Then I would export this into my Anki decks learn them there, and then return when I need something else to focus on.
Edited: 2010-10-23, 6:02 pm
Reply
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions! - Sign up here
JapanesePod101
#4
Quote:Reminds me of: http://www.ted.com/talks/gary_wolf_the_ … _self.html
Haha, well. As long as it helps me to improve faster I am totally for any kind of tracking Wink

Quote:I can certainly see some benefits of what you are trying to achieve. However, nothing is going to replace Anki for me, so I would love to have a plugin that could export stats from my Anki decks somehow.
I was thinking about adding exactly this feature. I also have lots of stuff in Anki which I don't simply want to "lose". It's pretty high on my list of things to add but I haven't looked at Anki's export format and how much information it includes yet. I'll look into that.

EDIT: Just looked at the Anki export format. Definitely possible to do that, will be added next.

Quote:One thing I would love, is to toss a script of something at something like your website, say an entire drama series, then it would go and pick out the most common words and suggest to me to learn these. In an order that would be most beneficial of course. Then I would export this into my Anki decks learn them there, and then return when I need something else to focus on
Good idea, and shouldn't be too hard to do. Will probably add it soon.
Edited: 2010-10-23, 6:33 pm
Reply
#5
nest0r Wrote:Reminds me of: http://www.ted.com/talks/gary_wolf_the_q..._self.html
Haha, I've been tracking my sleeping habits for the past year. In-fact today was the first time I woke up by noon and got a decent amount of sleep since the end of August. XD

dennybritz Wrote:
Daichi Wrote:One thing I would love, is to toss a script of something at something like your website, say an entire drama series, then it would go and pick out the most common words and suggest to me to learn these. In an order that would be most beneficial of course. Then I would export this into my Anki decks learn them there, and then return when I need something else to focus on
Good idea, and shouldn't be too hard to do. Will probably add it soon.
I look forward to seeing updates then, consider this topic subscribed.
Reply
#6
I implemented functionality for importing Kanji and Vocabulary Knowledge from an existing Anki deck. It can be found here:

- http://www.languagebundle.com/kanji/import_anki for Kanji.
- http://www.languagebundle.com/vocabulary/import_anki for Vocabulary.

This will import facts as well as scheduling information that's why you need to upload your complete Anki deck. Uploading and processing might take a while if you have a lot of words/kanji.
Reply
#7
I love the idea of your site! I tried uploading both vocab and kanji decks, but in the first case got the message "something went wrong", and in the second - only 1 kanji was imported.
Reply
#8
Thank you =) Hm, that's strange. Are you sure you put in the correct field #? I only tested it with 2 different decks but it seemed to work fine. I will look at the log file and test it with other decks and post back here.
Reply
#9
I'll try it out as well.
Reply
#10
Yeah, pretty sure. I also tried it with 2 decks. And now it doesn't even let me use the Vocab page - I always get the message "something went wrong", even when logged in with different account.

By the way, what dictionary does it use?
Reply
#11
I think this is a great idea, thanks for sharing it!
Reply
#12
Lindley Wrote:Yeah, pretty sure. I also tried it with 2 decks. And now it doesn't even let me use the Vocab page - I always get the message "something went wrong", even when logged in with different account.

By the way, what dictionary does it use?
Sorry about that, I was trying to fix the import ;) It should work again now. I changed some stuff in the import function and it let me import 3 different vocabulary decks (unfortunately I don't have more to try it on). Could you try the vocabulary import again and see if it works, now? If not, would you mind telling me what kind of deck you are trying to import (What do the fields, words, etc look like and how big is it)?

It uses the jmdict and kanjidic from http://www.csse.monash.edu.au/~jwb/j_jmdict.html / http://www.csse.monash.edu.au/~jwb/kanjidic.html
Reply
#13
Seems like Vocab is working now Smile

Here's what I used:

Japanese RTK deck: 1 - Frame #, 2 - kanji, 3 - word, 4 - word in hiragana, 5 - example sentence.
Total items: 1871. Total size: 2.66 Mb. Result: 0 imported via Kanji import.

Basic Kanji Book deck: 1 - kanji, 2 - word in kana, 3 - translation.
Total items: 4872. Total size: 4.77 Mb. Result: 0 imported via Kanji import.

My own vocab deck: 1 - word, 2 - translation, 3 - word in hiragana.
Total items: 1914. Total size: 2.46 Mb. Result: 273/281 imported via Vocab import. 18 imported via Kanji import.

Core2k deck: 1 - word in kanji, 2 - word in furigana, 3 - word in kana, 4 - translation, 5-7 - extra info, 8 - example sentence, 9-11 - info for the sentence.
Total items: 514. Total size: 1.93 Mb. Result: 113/115 imported via Vocab import. 25 imported via Kanji import.

I've also got some questions:

Why such a big difference between "total items" in the deck and "items uploaded"? There can't be duplicates in Anki, and even if we half the total items (if those were cards, not facts), still the difference is more than half. If we take the import of Core2k via Kanji import, there're many more kanjis in the deck than the imported 25.

Do you think it would be feasible to make a list of all vocab one has in the database? Kinda like the list of all kanji? Maybe even group it by JLPT levels?

Can one see how many items one needs to review on the site?

Overall, your site is awesome Smile The possibility to get a parsed text, study words/kanjis, mark them learned, and have it all at one place is priceless!
Reply
#14
I'm still not sure why the Kanji Import doesn't work for you. Do you mind sending me one of your Kanji Decks so I can try it and see what's going wrong?

Quote:Why such a big difference between "total items" in the deck and "items uploaded"?
Both Kanji and Vocabulary Import functions import only items that have been reviewed at least once. Items that haven't been reviewed at all imply that those were not learned yet so I thought there is no reason to import them. Does this make sense, or have you reviewed all of your 1914/514 items at least once? Then that would be a problem.

Quote:If we take the import of Core2k via Kanji import, there're many more kanjis in the deck than the imported 25.
I think the number might be right. The Kanji Import function was actually intended for Kanji Decks only so that it only imports Kanji that are alone in their own field. That means that, for example, 日本 will result in 0 imports, but 本 alone would result in one Kanji import. And the Core2k deck has mostly words (not Kanji). I could add functionality to import all the Kanji in each word, if that is useful.

Quote:Do you think it would be feasible to make a list of all vocab one has in the database? Kinda like the list of all kanji? Maybe even group it by JLPT levels?
Yes, I'm planning to do that. The problem with this is that I don't have vocab lists for the new JLPT since they are not being officially published. I could use the "old" JLPT lists but I'd rather work with the new one only. I thought about going through the sample tests for the new JLPT and extract the vocabulary for the appropriate levels from there, but that might take some time. If anyone here has other ideas, please let me know Smile

Quote:Can one see how many items one needs to review on the site?
Yes, if you go to the Kanji/Vocab overview page then below the "Review Statistics" graph you can see the number of "Due" cards.

Quote:Overall, your site is awesome smile The possibility to get a parsed text, study words/kanjis, mark them learned, and have it all at one place is priceless!
Thank you so much Smile I will continue working on the site and improve and add features gradually. Right now it's far from perfect and I'm hosting it on a very slow server. Hope to change that sometime soon.
Edited: 2010-10-27, 1:55 pm
Reply
#15
Quote:I'm still not sure why the Kanji Import doesn't work for you. Do you mind sending me one of your Kanji Decks so I can try it and see what's going wrong?
Sent to info@languagebundle.com

Quote:Does this make sense, or have you reviewed all of your 1914/514 items at least once?
That could be it. I don't think I went through all the items in any of the decks...

Quote:he problem with this is that I don't have vocab lists for the new JLPT since they are not being officially published.
I've seen unofficial compiled lists for new jlpt on the web, so maybe you don't need to go through the whole extracting process yourself. Goodle search turns up at least a couple results. In the email I included some links that I used.

Quote:I could add functionality to import all the Kanji in each word, if that is useful.
Well, I think it'd be useful - when studying a new word you also study all the characters, right? So going and manually clicking each kanji "learned" after importing the vocab seems kinda pointless...What'd you think?

Quote:you can see the number of "Due" cards
Duh...Sleepless nights have their consequences Smile

Also, I don't know whether this is feasible or even necessary, but what about exporting vocab from the site? For example, I just love using text analyzer feature - running dorama scripts through it to learn words is a great help. But then I don't have the newly-learned words in Anki, and since I do most of the reviews on the go/during classes, I lose the ability to review them. Any thoughts?
Reply
#16
Thank you!

I think I found the problem with importing the decks. I didn't correctly set the field number, please try again, it should be working now.

Quote:I've seen unofficial compiled lists for new jlpt on the web, so maybe you don't need to go through the whole extracting process yourself. Goodle search turns up at least a couple results. In the email I included some links that I used.
Alright, thanks, I'll look into that. If the lists are good, I'll definitely include a vocabulary list for the JLPT.

Quote:Well, I think it'd be useful - when studying a new word you also study all the characters, right? So going and manually clicking each kanji "learned" after importing the vocab seems kinda pointless...What'd you think?
Personally I learn Kanji and Words 'separately' and there are quite a couple of words I can recognize but I'm not exactly sure about each Kanji. But I can definitely see your point, so I'll try include a checkbox to enable that option.

Quote:Also, I don't know whether this is feasible or even necessary, but what about exporting vocab from the site? For example, I just love using text analyzer feature - running dorama scripts through it to learn words is a great help. But then I don't have the newly-learned words in Anki, and since I do most of the reviews on the go/during classes, I lose the ability to review them. Any thoughts?
Yep, an export feature is on my list of things to include! I think the text analyzer doesn't work very well yet, so I'll try to improve that first.
Reply
#17
Quote:Yep, an export feature is on my list of things to include! I think the text analyzer doesn't work very well yet, so I'll try to improve that first
Awesome! Will be eagerly awaiting both.

Quote:But I can definitely see your point, so I'll try include a checkbox to enable that option
Would be very useful, thanks!

Quote:I didn't correctly set the field number, please try again, it should be working now.
Yes, seems like it's okay now - I tried a deck of 3007 cards, 600 of which were reviewed at least once, and it imported 594 kanji. The total list now doesn't look as daunting Wink

If I can somehow help you run more tests or anything - let me know. I'd love to see this project perfected. Keep up the good work!
Reply
#18
I implemented a feature called "Vocabulary Lists" and made some modifications to the Vocabulary Display. More Details here: http://languagebundle.wordpress.com/2010...rovements/

Right now I only have Lessons for the JLPT N5. The corresponding list can be found here: http://languagebundle.com/vocabulary/list/official,n5 . I hope to add N4 next week and N3 the week after. It takes a while because I am entering the words from the JLPT lists manually. There are just too many words having the same readings so that I need to choose the correct entry by hand even though I have a vocabulary list.
Reply
#19
Thanks for the improvements! One other thing I noticed - when reviewing vocab some words have their reading in parentheses on the front of the card, and some don´t. It´s no big deal, but it kinda lets you cheat if you're reviewing word pronunciation. Is this a bug or am I missing something here?
Reply
#20
Right now it shows the Kana Readings only if you have not learned the Kanji used in the vocabulary word. If you have learned the Kanji contained in the word it shows the Kanji Reading only. My plan is to soon improve the SRS usability (make it look nicer and include user settings). I'll then include an option to always turn off/on Kana Readings. Do you think that's a good idea or what options would you like?

Oh yeah, this weekend I'll have a JLPT3 list as well as example sentences, some of which include audio (From Core6000). I also added Furigana support so that all readings will be displayed as furigana instead of after the word.
Reply
#21
Ah, I see. Sure, option to turn on and off the readings would be great.

I´ve got a couple more questions:
- numpad keys while reviewing don´t work. Can they be mapped alongside the regular ones?
- romaji search. I don´t think it's that necessary, but if it's easy to add - it'd save (at least me) that hassle of typing into Wakan editor and copypasting into vocab search. Japanese support miraculously vanishes from my computer on a weekly basis :/
- do you plan to add Grammar to the site? Maybe JLPT-based lists? Or (which I think would be too tedious) to tag certain words and constructions with appropriate JLPT-level labels.
- displaying the amount of cards left for review in SRS - yay or nay?

Thanks for all the hard work! Looking forward to improvements Smile

P.S. Forgot to ask: Say, I import vocab from decks that contain multiple fields - vocab, kanji, sentences etc. After the import, the words are marked according to how well I know them. But unless I import the kanji from the same deck or mark them as known manually, the kanji on the site won't be affected, right? Also, if the deck I'm importing from wasn't designed to have kanji as a question, but rather as an answer or an additional field, how then does the site rate the kanji after import? I'm a little confused here.
Edited: 2010-11-06, 11:21 am
Reply
#22
Hey,

Thanks again for your feedback, always give me more ideas of things to add Smile

Quote:numpad keys while reviewing don´t work. Can they be mapped alongside the regular ones?
Okay sure, I'll add it. I only have a laptop so I didn't think about the numpad Wink I'll add that together with the update this weekend.

Quote:romaji search. I don´t think it's that necessary, but if it's easy to add - it'd save (at least me) that hassle of typing into Wakan editor and copypasting into vocab search. Japanese support miraculously vanishes from my computer on a weekly basis
Okay, I'll add that with the update this weekend too. The only problem here is distinguishing between english and romaji input. Which one do you think is better:
- Have two search boxes, one for Japanese (Romaji or Kanji/Kana), one for English
- Have only one search box and treat input as romaji if no English word is found

Quote:do you plan to add Grammar to the site? Maybe JLPT-based lists? Or (which I think would be too tedious) to tag certain words and constructions with appropriate JLPT-level labels.
Yes, I was planning to add grammar from the beginning but right now it's on the bottom of my list. It seems like a lot of manual work and I need to input the grammar constructs manually. I then wanted to match sentences against these constructs to determine which sentence uses what grammar construct. I was thinking about copying most of the Grammar from the Kanzen Master books as they seems to have good grammatical explanations for syntax.

The JLPT tagging is a good idea. Actually I was thinking about adding a little JLPT tagging feature for both vocabulary and Sentences so that every user can tag items to their appropriate level. I was just worried that some users might mess with it just for the fun. But now... the more I think about it, I don't think that'll happen. I might actually add this with the update this weekend as well if I have time.

Quote:displaying the amount of cards left for review in SRS - yay or nay?
Okay, when I work on the SRS I'll add an option to do that. Personally I didn't really like it because the progress display in Anki always distracted me from the actual reviews. Actually there was a thread somewhere in these forums where someone said that turning off the review progress display dramatically improved his performance Wink

Quote:P.S. Forgot to ask: Say, I import vocab from decks that contain multiple fields - vocab, kanji, sentences etc. After the import, the words are marked according to how well I know them. But unless I import the kanji from the same deck or mark them as known manually, the kanji on the site won't be affected, right? Also, if the deck I'm importing from wasn't designed to have kanji as a question, but rather as an answer or an additional field, how then does the site rate the kanji after import? I'm a little confused here.
Yes, Vocabulary Import only affects vocab, Kanji only affects Kanji. The site only looks at the field for the field number you specify and then looks at the scheduling interval for the whole flash card. (0-2 Days: Just Learned, 2-10 Days: Good, 10-50 Days: Very Good, 50+ Days: Perfect). So the site won't know if your Kanji field is the "primary/question" field or not, it will simply use the interval for the whole flash card. The Kanji Import function was actually intended for Kanji Decks and not necessarily vocab decks having kanji fields.

Thanks again!
Reply
#23
Thanks for the update! It works like a charm Smile You definitely made my learning much more enjoyable.
Reply
#24
dennybritz Wrote:One problem with Anki is that it's a general-purpose SRS. It doesn't know about your Japanese knowledge. You can add a new Vocabulary Item to Anki but already have it in 3 other decks or 20 sentences. Anki won't know. Anki doesn't even know if you are currently reviewing Kanji, Vocabulary, Sentences, or something else. Of course, it's not the goal of Anki to do that and I am not saying that it should be. But wouldn't it be nice if you actually knew how much you know? If you could tell exactly how much more vocabulary or grammar you actually need to learn in order to pass the JLPT N2 (or whatever test)? If you want to read that news article (or watch a drama episode) and you could exactly tell how many words you already know and what is missing in order to understand it? Or if you could get recommendations on what to read, watch or study next, based on your current knowledge?
I've been using the JxPlugin, which can report some of these things (eg, how many of the JLPT3 vocab are in your deck, have been reviewed, are mature, etc), although it's buggy, it doesn't combine information across all decks (afaik), nor does it keep track of grammar knowledge. I'd be great to have an improved version.

Knowing what vocab and grammar points you need to study in order to understand a given novel/episode would be extremely handy if you could actually analyze the grammar, but I'm not knowledgeable enough in linguistics to judge this.

dennybritz Wrote:My goal is to create such a website/tool. It's main goal will be to keep track of a learner's knowledge in 4 areas: Kanji, Vocabulary, Grammar, Sentences. Not simply if you know something or not, but also *how well* you know it, i.e. how well you perform on reviews of certain items. Based on the learner's knowledge I would then create highly personalized content. For example, I could find news articles or TV shows which use/cover the majority of a user's knowledge. I can find the perfect i+1 sentences to practice and learn something at the same time. Or I could tell exactly how/where the learner need to improve in order to, for example, pass the JLPT N2. And maybe most importantly, it would go together with whatever primary method of study you are using. If you are taking a class or reading a textbook or learning from RTK, you simply add the knowledge to the system and everything else is done for you.

I am currently debating if I should continue with the project or not. That's my reason for posting here. Provided it will be free to use, would you use such a system? I want to hear any feedback or input you have.
I think such a tool would be extremely useful, particularly if you split it into a handful of separate open source components and then had a website that simply integrated them.

For example:

-- Knowledge analysis tools
1) A program that analyzes grammar, vocab, etc across all decks and spits out that data in some structured way.
2) A program that, given an input text (eg article, light novel, drama/anime transcript), analyzes the grammar and vocab used (quantifying it in some way compatible with #1).
3) A program that given your knowledge from #1 and your goal text from #2, lists what new grammar/vocab you need (perhaps with a frequency rating to sort by) and optionally what knowledge you need to work on (eg, vocab/grammar that's been seen but isn't mature).

-- Suggestion tools
4) A database that maps articles, show transcripts, and light novels to the data obtained from #2.
5) A website/tool that allows people to submit data to the database #4. For copyrighted works, just store the data resulting from analysis by program #2, but for other works you can store the text as well.
6) A tool that, compares a user's knowledge from #1 against database #4 (by using the comparing program #3) and rates all the media in the database according to how much more knowledge you need.
7) A tool/website frontend that uses #6 to suggestion fairly i+1 media.


Integration is a good thing, but it's ideally done by gluing together a handful of smaller, extremely specialized, components (ie, the "unix" way). Also, a number of small, open source components allows people to use them in new, interesting combinations that you might not have originally thought of.
Reply
#25
Lindley Wrote:Thanks for the update! It works like a charm You definitely made my learning much more enjoyable.
Happy to hear that! Smile I will most likely improve on the text analysis feature and add vocabulary import functionality next week.

overture, I completely agree with you that a system should be made up of small independent components. I think most of what you listed is not technically difficult to realize, the only problem is getting the initial data for vocabulary and grammar.

For the grammar I see no other way than manually inputting data into the system and then matching these entries against a given text. I would need to build a database similar to jgram.org. Unfortunately jgram.org does not store enough information for each grammar entry in order to be useful for text analysis.

There is a similar problem with vocabulary. Right now I am using a dictionary to classify/extract vocabulary, which I originally thought was a good idea. However I realized that there are quite a few problems with this approach. Most of the dictionary entries are ambiguous or contain too much information to be useful (too many kanji/kana readings many of which are basically never used). Also, there are probably a lot of phrases which are not included in a dictionary. Also, words often contain several very distinct meanings (and readings) but they are all associated with one dictionary entry. Now I think that vocabulary (and the corresponding flashcards) should be created manually, but it must be in a unified way (i.e. there shouldn't be two different entries for the same vocabulary word/meaning).

That is one of the problems with SRS decks as I mentioned above. SRS are not "smart" enough to see the similarity between two flashcards which basically test the same thing, especially not if they are in different decks. Therefore SRS cannot classify knowledge reliably. The same goes for JxPlugin. As far as I know it looks at all the flashcards without being "smart" about it. For example, if it finds a matching JLPT words or kanji in some card it will mark it as known or seen. However, the card could be a sentence card and you paid no attention to that kanji or word. If the word differs slightly, it won't recognize it. There will also be a lot of false positives, as many words are ambiguous or can be used as part of other words. You can't add new words or phrases to its database. I agree that the plugin statistics are nice to look at (especially to see what a shared deck actually covers), but in my opinion they are not at all representative of your actual knowledge of the facts! And again, statistics are just statistics, I think much more interesting is what you can actually do by analyzing the statistics (such as transcript analysis you mentioned).

I recently thought of a system like the following:
Everyone can add, edit, remove or classify flashcards (adding fields as necessary), which are linked to dictionary entries for more information. Basically, this would yield a large collection of *unified* flashcards for kanji, vocabulary, grammar and sentences, tagged and categorized for different decks (such as jlpt or kanken level). These flashcards are then used to make up your own decks (using only the fields you want), reviews, and analyze texts. So, basically it would be a community effort to put all data about Japanese into reviewable and analyzable flashcard form Wink Based on that, personal suggestions and text analysis would be much easier.
Reply