Research project: Optimising the Study of Japanese Technical/Topic-Specific Vocab

Hello Koohii – I'm an MSc student of Computer Science at the University of Bristol and for my research project, I've built a system which (hopefully!) helps one learn all the vocabulary for a chosen topic (such as 'economics', 'politics', or even 'Pokémon') in Japanese. I built it to help people learn how to read difficult sections of newspapers, or niche topics which textbooks simply don't cover (like Viking puppetry). I need to evaluate its effectiveness, so I am recruiting Japanese learners to try out the software. If results are good, I may be able to develop it even further, possibly even porting to other languages!

How it works
  • You tell the system what level of Japanese you already understand (for example, JLPT N3). Note: natives are also welcome to try it!
  • You specify your topic of interest, eg. 経済 (economics) – it must, however, be a valid category name on Japanese Wikipedia.
  • The system searches for Wikipedia articles under that category, and any subcategories (recursively), extracting the text and counting all the words beyond your stated proficiency level.
  • It generates a vocabulary list of all the words above your proficiency level, sorted in word frequency order. The list provides straight-from-source example sentences, part-of-speech tagging, JLPT level estimates, and dictionary definitions for all the words.
The technology

It's built upon opensource technologies: the kuromoji tokeniser (which is basically MeCab, due to using the same Viterbi algorithm and using the same 'ipadic' dictionary) and EDRDG's JMDict and JMnedict. JLPT levels of vocabulary is estimated from Jordan Waller's JLPT resources page. If I've missed anything in this quick post, a full description of technologies, access to source, and licensing will be detailed in my thesis (to be submitted on September 15th, and able to be requested as a digital copy thereafter).

Evaluating its effectiveness (participation)

This is where you come in. I need two groups of Japanese learners:
  • one group to test how much one's vocabulary improves by studying from the generated list (for those who'd like to try out the software);

  • and another group as a control, to test whether simply learning in one's own style with reference to the same list of Wikipedia articles would be any better – If you happen to be working towards learning some topic already but don't want to use the software, why not log your before & after performance and help by joining as a control?
Participants in each group may study for as long or as short as they choose – be it minutes, hours, or days – they will be compared against others who chose to study for the same amount of time. Of course, the longer participants study, the clearer the results may be. However, I intend to collect data on around August 18th, so please cease any further revision on that date and make sure you take your 'after revision' quiz!
Note: participation is anonymous! Smile


[Image: 0UhqnDY.png]

How to access it

Visit my web app. Enter the details below:
  • username: thesis
  • password: deadline

From there, just follow the instructions! The site will be accessible until around August 18th, so please take PDF copies of your generated vocabulary lists if they are valuable to you. If you have any problems (particularly should the server crash without my realising!), by all means report them as soon as possible to me through a private message or thread response.
Thank you ever so much if you are able to help out!
Edited: 2016-07-29, 9:43 am
This looks really cool, and I'm sorry I missed the post while the site was still up. I hope the research is going well. Do you plan to re-open access to this or a comparable tool at some point? I will bookmark this thread and remain hopeful. :-)
Wow surprised no one originally replied to this bamboo, it looks like an awesome idea. Keep us updated if you can!
Thanksgiving Sale: 30% OFF Basic, Premium & Premium PLUS Subscriptions! (Nov 13 - 22)