kanji koohii FORUM
Programming my CS Thesis Project for Japanese Learners - Printable Version

+- kanji koohii FORUM (http://forum.koohii.com)
+-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html)
+--- Forum: Off topic (http://forum.koohii.com/forum-13.html)
+--- Thread: Programming my CS Thesis Project for Japanese Learners (/thread-12299.html)



Programming my CS Thesis Project for Japanese Learners - datrukup - 2014-10-30

Hey Y'all,

My university's Computer Science thesis is focused around software engineering rather than theory or research, so I can basically program about anything I want as long as it's a comprehensive, polished program. I thought I might as well use this opportunity to make something awesome to help Japanese learners. I have a few ideas but I'm curious if anyone has any wish-lists of things they wish their current study-aid programs could do, or have any dreams of programs that would make Japanese learning easier.

By the way, we definitely need to make a demonym for kanji.koohii.com users, because it feels a little like an insult to simply call y'all "Japanese learners."


Programming my CS Thesis Project for Japanese Learners - aldebrn - 2014-10-30

datrukup Wrote:I have a few ideas but I'm curious if anyone has any wish-lists of things they wish their current study-aid programs could do, or have any dreams of programs that would make Japanese learning easier.
Awesome! My dream: please beat me to it: an HTML5 flashcard program with a nice developer API and offline support to replace Anki. It should have multiple scheduling algorithms to choose from (Leitner system like Koohii's SRS, Anki's SuperMemo-derived SM2 algorithm [I have pseudocode to implement this in case you don't want to chase down twenty functions in the Anki source code], fancy machine learning stuff like this, or an ensemble approach balancing all of the above). If you make it open-source or if you're allowed to have collaborators, I could help by adding importing of Anki review logs/decks. Of course social integration, both internal to the site and Facebook/Twitter/et al. Integrate a dictionary into it knows what words you're looking up, if not what you're reading. An idea would be to team up with someone from the psychology department studying memory or educational methods and see if they're interested in long-term data that such a tool could generate (Ankiweb probably has a treasure trove of data for this purpose but it wasn't designed for research purposes), or if there's other aspects of the problem they'd like to get data on, and build that into the system.

This probably doesn't matter to you but I'm thinking about paying a few 日本人 to write a few thousand kanji to make a new public-domain stroke order database. I'm tired of KanjiVG's Creative Commons restrictions, plus I'm thinking about making a kanji recognition/grading tool geared towards correct writing rather than speed. I'm hoping that by requiring users to use correct stroke order, you can dispense with using training data. Basically an open-source version of sljfaq.org's online kanji recognizer, which I love and would love to hack on. What do you think of that?

Another thing I'm working on on the side is manipulating kanji decomposition data from CJKVI (and to a much lesser extent, the lower-quality data from KanjiVG) to build a component dependency graph for kanji. See if that can be used to improve reviews (i.e., you just missed 容, so maybe accelerate the review schedules for 欲 and 浴 a little). Such a graph could also be used to design lists like RTK, where components are gradually introduced and you learn all the kanji that can be built out of them at once.

Open to feedback on my feedback Smile and have fun in grad school!

datrukup Wrote:By the way, we definitely need to make a demonym for kanji.koohii.com users, because it feels a little like an insult to simply call y'all "Japanese learners."
I am trying to channel Imgur by saying "Hi Koohii".


Programming my CS Thesis Project for Japanese Learners - yogert909 - 2014-10-30

Holy crap aldebrn, I was thinking about the same kind of thing but I woudn't have the skills to pull it off like you or datukup could. I almost emailled you a month ago about brainstorming a company around the idea. However I'm wondering if duolingo is doing something similar, just without a japanese class yet.

My idea was to build a platform that would learn how people writ large learn and also learn how each individual learns. So it would be another srs that would automatically upload to the cloud and then you would be able to crunch the data and determine things like exactly what interval curve allows for the most amount of retained knowledge per unit of time, how does time of day effect learning... Hopefully a lot of the analytics could be automated. And of course it would have the kind of machine learning system that aldebrn linked. You would be able to run experiments where you could change the order of certain cards to see if it's more efficient and you could see if mixing in grammar and sentences is more or less efficient and at what ratio to mix, etc... The idea is that the platform is a data acquisition/self adjusting platform first and srs second. I've got a bunch of ideas written down so if you end up wanting to do something like this email me and I'll give you the full brain dump.


Programming my CS Thesis Project for Japanese Learners - yogert909 - 2014-10-30

aldebrn Wrote:
datrukup Wrote:By the way, we definitely need to make a demonym for kanji.koohii.com users, because it feels a little like an insult to simply call y'all "Japanese learners."
I am trying to channel Imgur by saying "Hi Koohii".
"Hi Koohii" I like that.


koohito..?
koohiista..?


Programming my CS Thesis Project for Japanese Learners - rich_f - 2014-10-30

If you put Duolingo in Japanese mode, and set it to learn w/e language you speak, it kinda/sorta works. Kinda.

To the OP: good luck with the Master's Project! I did one a while back, and the lit review was totally not fun. (But it was better than writing a thesis!)


Programming my CS Thesis Project for Japanese Learners - aldebrn - 2014-11-05

Op, what'd you decide on?


Programming my CS Thesis Project for Japanese Learners - toshiromiballza - 2014-11-05

aldebrn Wrote:I'm tired of KanjiVG's Creative Commons restrictions
What's restrictive about it?


Programming my CS Thesis Project for Japanese Learners - aldebrn - 2014-11-05

toshiromiballza Wrote:
aldebrn Wrote:I'm tired of KanjiVG's Creative Commons restrictions
What's restrictive about it?
Only one: non-commercial. There's occasionally ugly traffic on the mailing list about this ebook or that app that is sold and that uses KanjiVG. I don't plan on doing anything commercial, but, meh, who needs that. I use unlicense.org for all my work anyway. Hence my interest in a public-domain stroke order database (though KanjiVG is much more than that, containing component groupings and variants, etc., very interesting stuff).


Programming my CS Thesis Project for Japanese Learners - toshiromiballza - 2014-11-05

It says you can freely use it for any purpose, even commercially: http://creativecommons.org/licenses/by-sa/3.0/

As long as you distribute it under the same license, that is...


Programming my CS Thesis Project for Japanese Learners - aldebrn - 2014-11-05

toshiromiballza Wrote:It says you can freely use it for any purpose, even commercially: http://creativecommons.org/licenses/by-sa/3.0/

As long as you distribute it under the same license, that is...
Fuuuu... when did this lose the "NC" non-commercial qualifier? Pretty fricking awesome, thanks for clearing up my ignorance!

Edit: in 2009, you dope!


Programming my CS Thesis Project for Japanese Learners - Sebastian - 2014-11-05

aldebrn Wrote:Another thing I'm working on on the side is manipulating kanji decomposition data from CJKVI (and to a much lesser extent, the lower-quality data from KanjiVG) to build a component dependency graph for kanji. See if that can be used to improve reviews (i.e., you just missed 容, so maybe accelerate the review schedules for 欲 and 浴 a little). Such a graph could also be used to design lists like RTK, where components are gradually introduced and you learn all the kanji that can be built out of them at once.
That last part would be a great learning tool.

Imagine that you could feed it JLPT vocabulary, Kanken vocabulary, vocabulary from the textbook you're using, or even Japanese novels, websites, or whatever you want to read, and get a list of all necessary kanji, in order of graphic complexity.


Programming my CS Thesis Project for Japanese Learners - vileru - 2014-11-06

aldebrn Wrote:
datrukup Wrote:I have a few ideas but I'm curious if anyone has any wish-lists of things they wish their current study-aid programs could do, or have any dreams of programs that would make Japanese learning easier.
Awesome! My dream: please beat me to it: an HTML5 flashcard program with a nice developer API and offline support to replace Anki. It should have multiple scheduling algorithms to choose from (Leitner system like Koohii's SRS, Anki's SuperMemo-derived SM2 algorithm [I have pseudocode to implement this in case you don't want to chase down twenty functions in the Anki source code], fancy machine learning stuff like this, or an ensemble approach balancing all of the above). If you make it open-source or if you're allowed to have collaborators, I could help by adding importing of Anki review logs/decks. Of course social integration, both internal to the site and Facebook/Twitter/et al. Integrate a dictionary into it knows what words you're looking up, if not what you're reading. An idea would be to team up with someone from the psychology department studying memory or educational methods and see if they're interested in long-term data that such a tool could generate (Ankiweb probably has a treasure trove of data for this purpose but it wasn't designed for research purposes), or if there's other aspects of the problem they'd like to get data on, and build that into the system.

This probably doesn't matter to you but I'm thinking about paying a few 日本人 to write a few thousand kanji to make a new public-domain stroke order database. I'm tired of KanjiVG's Creative Commons restrictions, plus I'm thinking about making a kanji recognition/grading tool geared towards correct writing rather than speed. I'm hoping that by requiring users to use correct stroke order, you can dispense with using training data. Basically an open-source version of sljfaq.org's online kanji recognizer, which I love and would love to hack on. What do you think of that?

Another thing I'm working on on the side is manipulating kanji decomposition data from CJKVI (and to a much lesser extent, the lower-quality data from KanjiVG) to build a component dependency graph for kanji. See if that can be used to improve reviews (i.e., you just missed 容, so maybe accelerate the review schedules for 欲 and 浴 a little). Such a graph could also be used to design lists like RTK, where components are gradually introduced and you learn all the kanji that can be built out of them at once.
I would pay a lot of money for this. I'd pay even more if it accommodated several languages (ancient Greek, Chinese, and German still await), and I wouldn't be surprised if a wildly successful Kickstarter campaign got off the ground—just look at Fluent Forever. It's rare for me to feel sad for not having something that doesn't even exist, but that sums up my feelings towards this not-yet-existing webapp.

Also, a Google search function would make it even more awesome. As for the dictionary lookup tool, I'd love for there to be search options for multiple dictionaries. This would be a godsend, since I usually use Weblio for definitions, ALC for collocations, and Linguee for example sentences. Even better, customizable search options.

By the way, I enjoyed admiring the beautiful LaTeX-formatted papers you linked. Please feel free to frequently post links to such objects of beauty.