kanji koohii FORUM
New program for vocabulary and sentence mining - Printable Version

+- kanji koohii FORUM (http://forum.koohii.com)
+-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html)
+--- Forum: Learning resources (http://forum.koohii.com/forum-9.html)
+--- Thread: New program for vocabulary and sentence mining (/thread-4917.html)

Pages: 1 2


New program for vocabulary and sentence mining - gh123 - 2010-01-28

I’ve been developing a program for my graduate studies that I thought some people here might find useful. It’s a program for studying Japanese vocabulary that automatically mines sentences from Japanese websites for its exercises. It can also suggest new words to study based on your proficiency level. You can try it at: http://tangonet.dynalias.org/

Please let me know if you have any questions or comments.


New program for vocabulary and sentence mining - spleenlol - 2010-01-28

I like it so far. I'm not that advanced yet to use it all the way but it looks great. Smile
Good work.


New program for vocabulary and sentence mining - vosmiura - 2010-01-28

Just started but I like it so far. Great idea.

What kind of spacing algorithm does it use?


New program for vocabulary and sentence mining - Chandlerhimself - 2010-01-29

Great program. Easy and useful. The only question I have is, does it keep track of which words you have gotten correct or are they just random. For example if I get a word correct today is will I see it tomorrow? If not, this program is great and extremely useful for people studying for the JLPT. I can easily find examples for words I don't know and put them into Anki as well as keep track of how many words I still need to learn.


New program for vocabulary and sentence mining - vosmiura - 2010-01-29

It mentions that it has some Spaced Repetition System, and if you look at the "Profile", each word shows when it's next scheduled.

One suggestion, it would be good to be able to somehow remove some System Selected vocab from your study. I got for example 日本 which required typing "にっぽん”. In a case like this where the reading is ambiguous, I'd rather remove it.


New program for vocabulary and sentence mining - gh123 - 2010-01-29

Thanks for the comments.

vosmiura Wrote:What kind of spacing algorithm does it use?
Right now it is using a fairly simple algorithm. As long as you answer a word's exercise correctly, it will keep increasing that word's interval. But if you answer incorrectly, it will reset that word back to the shortest interval.

Chandlerhimself Wrote:The only question I have is, does it keep track of which words you have gotten correct or are they just random. For example if I get a word correct today is will I see it tomorrow?
Yes it does keep track of whether or not you get words correct. As vosmiura said, you can look in the Profile section and see when a word will be reviewed next as well as a history of all the exercises (sentences) you've completed.

vosmiura Wrote:One suggestion, it would be good to be able to somehow remove some System Selected vocab from your study.
Thanks for the suggestion. I'll add a way to remove system selected words this weekend.

The tools we use to automatically parse sentences will sometimes pick the wrong reading for words. This mostly happens with counters and numbers. I've tried to remove most of these words from the part of the system that automatically selects vocabulary but if anyone finds any more errors please let me know.


New program for vocabulary and sentence mining - Jarvik7 - 2010-01-29

I don't intend on using this since I don't like/trust cloud services, but what is the methodology for suggesting words? Same JLPT level? frequency lists? If by JLPT level I guess it's not useful for people beyond 1kyuu level.

It would be interesting to see that functionality in an Anki plugin.


New program for vocabulary and sentence mining - Evil_Dragon - 2010-01-29

Seems not to be too useful for people who aced JLPT1. Too bad, the idea itself is pretty cool.


New program for vocabulary and sentence mining - aphasiac - 2010-01-29

I'm JLPT level 4, seems really neat tool so far.


New program for vocabulary and sentence mining - zazen666 - 2010-01-29

anki plug in!!!!!


New program for vocabulary and sentence mining - zazen666 - 2010-01-29

how do I change my level without taking the level check test?


New program for vocabulary and sentence mining - zazen666 - 2010-01-29

also, one small thing:

If I get the kanji 世界 and I type せいかい、well, the box does not turn green, so I know rightaway that it is likely, せかい。 Unless I am sure to click it wrong on my own, well, then I self corrct it and mark it "correct". This might re-inforce bad habits. Just 2-cents.....


New program for vocabulary and sentence mining - zazen666 - 2010-01-29

one more thing, I wonder if an automatic translation might not be helpfull? The sentences target the reading of 1 jukugo, which is good, but soemtimes there are other unknow words in the sentence. I guess we could use rikaichan though....


New program for vocabulary and sentence mining - Tobberoth - 2010-01-29

Sounds like a great idea, but I find it would be a lot more useful if it could be integrated with anki... if not as a plugin then maybe some form of export function or whatever.


New program for vocabulary and sentence mining - wccrawford - 2010-01-29

The system could be massive improved with just 2 small features:

Automatically focus on the input box when you move to a new question. (Put the typing cursor in the box.)

Accept romaji input and auto-convert to kana, rather than requiring kana input.


New program for vocabulary and sentence mining - wccrawford - 2010-01-29

zazen666 Wrote:also, one small thing:

If I get the kanji 世界 and I type せいかい、well, the box does not turn green, so I know rightaway that it is likely, せかい。 Unless I am sure to click it wrong on my own, well, then I self corrct it and mark it "correct". This might re-inforce bad habits. Just 2-cents.....
Oh, I didn't realize it was doing this. I agree with zazen... It should -not- give you any indication that you are correct or not before you tell it to check you.


New program for vocabulary and sentence mining - Thora - 2010-01-29

Nice work, gh123.

How do I know which word in the sentence to type? It doesn't seem to be marked.

For people trying level 4, do the sentences contain level 4 kanji only?


New program for vocabulary and sentence mining - Nukemarine - 2010-01-29

Thora, it's the word with the box over it. Granted, it would be better if the kanji portion being tested is highlighted in some way I guess.


New program for vocabulary and sentence mining - Thora - 2010-01-29

Thanks. Strange that I didn't notice the box had changed position.


New program for vocabulary and sentence mining - kazelee - 2010-01-29

Got a server error after I clicked next.


New program for vocabulary and sentence mining - emreth - 2010-01-29

I really like this program so far, the idea is great. Mostly what I think would be better is being able to input romaji (sometimes i just don't feel like pressing alt+shift to switch to jp input), and like someone else said, when you input something it turns green to show that you're correct or not before you even enter it.

Also it'd be nice for when you're using "exercise" to get new vocabulary words if rather than just giving you # words automatically that it can give you a list of words/sentences that you can choose specifically from. Great program though! Looking forward to seeing changes.


New program for vocabulary and sentence mining - gh123 - 2010-01-29

Jarvik7 Wrote:I don't intend on using this since I don't like/trust cloud services, but what is the methodology for suggesting words? Same JLPT level? frequency lists?
The previous version of this program was actually targeted at offline use where the user could connect to a server and download updates of sentences when they wanted to. However, we did not believe that our current server could handle sending 1000's of sentences to many people at the same time. So we decided to develop a web-based version for the public release.

The method for suggesting words is based on frequency lists. We generated a general frequency list from a large collection of web texts and then used these frequencies to sort words for each JLPT level. We also use frequencies from a categorized collection of web texts in order to prioritize some words over others depending on the interests you select in your profile.

Jarvik7 Wrote:If by JLPT level I guess it's not useful for people beyond 1kyuu level.
Evil_Dragon Wrote:Seems not to be too useful for people who aced JLPT1.
This system can be used to study any word actually. When parsing sentences we detect every word used in the sentence, even those not in the JLPT. So if there are words you would like to study then simply create a vocabulary list (under the vocabulary section) and add words to it. It is true that the system does not currently support the suggestion of words outside of the JLPT. Although, since we did collect general frequencies for all words it would be relatively simple to add a level 0. But at the beginning the system would be suggesting words with high frequencies that just happen not to be in the JLPT. Therefore it does not mean that these words would necessarily be any more difficult than level 1 words.

I'll think a bit more about how to implement this and see if I can't get something working relatively soon.

zazen666 Wrote:how do I change my level without taking the level check test?
You can change your level at any time by going to Profile, then User Information, and editing your JLPT Level to something else.

zazen666 Wrote:If I get the kanji 世界 and I type せいかい、well, the box does not turn green, so I know rightaway that it is likely, せかい。 Unless I am sure to click it wrong on my own, well, then I self corrct it and mark it "correct". This might re-inforce bad habits. Just 2-cents.....
This was something carried over from a previous version of the system that we haven't received a lot of feedback on yet. It is true that people could continue making a mistake like this. I'll try changing this feature this weekend.

wccrawford Wrote:The system could be massive improved with just 2 small features:

Automatically focus on the input box when you move to a new question. (Put the typing cursor in the box.)

Accept romaji input and auto-convert to kana, rather than requiring kana input.
Thanks for pointing this out. It has been fixed to properly focus now. Sorry but currently we do not have any plans to support romaji in the system. Although I see how auto-converting it for input could be faster. This could take some time to implement though so I'll have to think about it more.

Thora Wrote:For people trying level 4, do the sentences contain level 4 kanji only?
Currently it is difficult to find sentences that contain only level 4 kanji or vocabulary. This is because we are mining sentences from Japanese websites that are for native speakers. However, the NHK site that we are currently mining does have some relatively easier sentences (when compared to newspapers). You can make the system use only these sentences by using the filters in the profile section.

kazelee Wrote:Got a server error after I clicked next.
Could you send me your username and the word that this error occurred on? I'll check to see what the problem was.


New program for vocabulary and sentence mining - mezbup - 2010-01-29

JLPT0 haha epic.

TBH I harvest vocab (not mine sentences) from just reading lots every day and JxPlugin shows all kinds of ridiculous(ly good Tongue) stats on which vocab is included in or not included in the JLPT and there is a big chunk (very substantial) which is outside the JLPT lists.

IMO a pure JLPT focus is only good for JLPT specific study.


New program for vocabulary and sentence mining - yukimine - 2010-01-29

I liked it so far! The only thing I felt it has lacking was words definitions/translations. It would be nice to have an english definition for the word we're being tested in the exercises field. I can definitely use this to train my reading, but if I'm looking to learn new words having to look at the dictionary is a bit time consuming. Unless you meant to use this site with some tool like rikaichan.


New program for vocabulary and sentence mining - drivers99 - 2010-01-29

Looks pretty slick so far. I did an exercise with 5 words. I could swear that it had me type だか as the answer for 高 in 高い (because I got it wrong when I typed たか ) but in the vocab list it says たかい. Is that normal? だ vs た I mean, I see that 高 can be だか but why is the vocab word たかい then?