Back

Nayr's Core5000 deck (Frequency Dictionary of Japanese)

#16
There's three reasons why the "Word" field was problematic:
1- "人 hito n. person, people, human being" and "若い wakai i-adj. young" on the same field
2- "いろいろ iroiro adv., na-adj. various" having two parts of speech
3- "余り amari adv. the rest n. (not) much" and many others like this, with more than one part-of-speech/translation pairs.

But I think I got it working where each of the four pieces of information in this Word field got tagged according to what it was (kana/kanji, roumaji, part-of-speech, or translation). And the techniques used should hopefully work when you update the deck. For now, and for this deck in particular, those four sub-fields are just colored differently by Fuzzy-Anki to easily verify that the algorithm worked. Try uploading the APKG (as it exists on Ankiweb.net) to http://fasiha.github.io/fuzzy-anki/

Personally I'd like to see this tagging of both (1) kana in the Reading field, and (2) the different parts of the Word field, with "span" HTML tags in the final deck, leaving users free to style them however they want: hide roumaji, expand part-of-speech abbreviations, etc. You can even make the deck's default styling be so it exactly mimics what it is currently (kana surrounded by [] in Reading), so it'll still look the same to you. If you agree and don't want to do this tagging yourself, I can figure out how to export the data as an APKG or CSV, before or after your next update.

Thanks for the deck and the fun programming mini-project.
Edited: 2014-08-22, 1:00 am
Reply

Messages In This Thread