Back

Mighty Morphin Morphology

#1
EDIT 19Feb2013: Information for the latest version, MorphMan 3, can be found here: (1) announcement/overview post (2) wiki (3) video tutorials

EDIT: I have released a new design of the plugin, called 'Morph Man 2'. More information can be found here. The rest of this post applies to the original plugin, Morph Man 1 (aka Japanese Morphology in Anki's shared plugin list).

I just released an anki plugin called 'Japanese Morphology' which is a suite of tools relating to morphological analysis. The biggest feature of the initial release is the ability to calculate how difficult a sentence is in terms of how many morphemes it has that you don't know. That is, it approximates the N in the "i + N"'ness of a sentence. This is useful if you subs2srs a show but want to only work on i+1 sentences.

* Note: currently only looks at Expression field, although you can modify that in the source.

-- How to use --

First you must create a database of all morphemes you know, known as "known.db". You can do this by creating databases of morphemes from various sources and merging them (you are automatically prompted to merge new databases with known.db).

Then I'd suggest setting the fields with the various features, filter by iPlusN:1, sort by vocabRank, then go to town on a subs2srs deck. If you run out of i+1 sentences, update known.db with the new sentences you've learned and repeat.

-- To create a morpheme database and initial known.db from anki cards
Open a deck, go to card browser, select all the ones you've studied, go to Actions>It's Morphin Time->Export Morphemes, select a source name (doesn't do anything in this initial release) then a file to save it to. It will prompt you to merge with known.db. For your first export, save as 'known.db' and don't merge. Continue doing this with all the cards you know in all your decks so that known.db is a better approximate of your knowledge.

-- To see what morphemes a card has
Select some cards in the card browser and go to Actions>It's Morphin Time>View Morphemes or hit ctrl+V to get a popup with all the morphemes in those cards' fields. It also displays all the morphemes those cards have that aren't in your known.db under the '-----New-----' heading. Note, you can blacklist certain parts of speech to filter which morphemes appear.

The columns in the output are the morpheme, the part of speech, the sub-part of speech, and the reading. The default blacklist is punctuation and particles.

-- To find the N in the i+N'ness of your cards
Create a field in your model called 'iPlusN' and mark it to be numerically sorted.
Then select some cards in the card browser and go to Actions>It's Morphin Time>Set i+N. This will determine the morphemes in that card and see how many aren't in known.db; the result is stored in the 'iPlusN' field. Note, you can provide a parts of speech blacklist so those morphemes will be ignored in the calculations.

-- To rank new vocab words by difficulty
Create a field called 'vocabRank', mark it to be numerically sorted, and then select some cards in the browser and go to Actions>It's Morphin Time>Set vocabRank. The higher the number, the more similar it is to other morphemes you've seen (in terms of same kanji and readings). *Note the current implementation is a rough approximation (see later posts) but errors in favor of false low scores, so you can trust that high scores are similar / easy.

-- To see which parts of a sentence are "unknown"
I wanted an easy way to see which parts of a sentence I should be focusing on when testing myself, so I added an option to set a 'unknowns' field which gets filled with the Ns in the i+N'ness of the sentence. I found it helpful to show this on my cards, below the sentence being tested and in a smaller font.

-- To create a db from a UTF-8 text file
Open MorphMan (on main window's menu bar), then Export morphemes from file, then select a file to export from and a place to save the db.

-- To analyze, compare, or merge dbs
Open MorphMan (on main window's menu bar), browse to open one or two dbs, then click the various buttons to compare db A to B (or just show A). You can view the resulting morphemes in 4column mode (morpheme, part of speech, sub part of speech, and reading) or 1column mode (just the base morpheme).

You can then save the results to a new database. This is useful if you want to have a separate db for every show you have and larger dbs that are a union of all shows of a genre, for more interesting analysis.

-- To tag facts that match against a db
You can use the It's Morphin Time>Mass Tagger feature to add a tag(s) to all facts whose Expression field has a morpheme found in a database you specify. For example, tag all facts in your subs2srs deck that contain words that are also in kore, or vice-versa.

-- To create a personalized vocab deck
You can use the It's Morphin Time>Morph Match feature to find an optimal matching of words (morphemes from a DB) to sentences (selected cards' Expression fields) and set the best word to learn from that sentence in the 'matchedMorpheme' field.

-- Final notes --
The code is up on github (and in your plugin directory): https://github.com/jre2/JapaneseStudy/tr...morphology , feel free to use/modify it in any way.

Mecab isn't great at determining the morphemes of loan words, so sometimes it will incorrectly break them up and wrongly inflate the i+N score of a sentence.

Please report any bugs, feature suggestions, etc.

--Changes:
v1.1: added vocabRank feature
v1.2: added unknowns feature, cleaned up menus, and renamed some files
v1.3: added MorphMan for managing dbs
v1.4: added MorphMan result saver and mass tagger
v1.5: added maximum cardinality morpheme matcher, bugfix for anki 1.0
Edited: 2013-02-19, 10:50 pm
Reply

Messages In This Thread