Back

Mighty Morphin Morphology

#29
Splatted Wrote:You said that it's possible to give priority to certain decks, but would it be possible to give words a rating based on how many times they appear?
Unfortunately my idea of using a full out assignment problem algorithm, which would allow you to assign 'costs' to certain pairings is being ruled out at the moment since it's too slow. A 100x100 (ie 100 morphems in db to 100 sentences in fact selection) cost matrix takes only a second, but 1000x1000 doesn't finish within an hour. I've done a bit of experimentation with implementing the matching algorithm in a different program using some fast C or Haskell code, but 1000x1000 still takes awhile and even slightly larger numbers very quickly cause it to not complete within an hour. I may return to this eventually, but for now I'm sticking with the maximum cardinality bipartite matching (ie, just find the largest set of possible pairings).

Splatted Wrote:If possible it would be nice to be able to exclude decks from this as well, since it's obviously not relevant how many times a word appears in something like Core 6000.
Note, you can still accomplish most things just by carefully crafting your DBs and fact selection. I make a separate DB for each deck, and sometimes even subsets of the deck, and then combine them (ie union) into larger DBs as well.

Splatted Wrote:It might also be useful to give more significance to words that appear in multiple decks, as they are likely to be more useful for other things as well.
The databases actually contain frequency information. The format is basically a TSV file with 4 column morpheme entry plus a number with how many times it was seen. I'll consider adding some frequency filter to MorphMan to make use of it.

Splatted Wrote:P.S. I was planning on making a Fate/Stay night deck too; where did you downlad it all from?
I made it myself. I'm currently working on re-timing subs from a bunch of shows I have and making decks for them. Once I get a complete set of sync'd eng/jap subs for a show I'll probably post them somewhere, FSN included.
Reply

Messages In This Thread