Back

Mighty Morphin Morphology

Update:

I found a Python segmenter here:
"Jieba" Chinese text segmentation

Any help on how to implement it/or help implementing would be useful. I think a Chinese Morphman would be of great benefit to the Chinese learning community.
Edited: 2014-02-12, 11:26 pm
Reply
overture2112 Wrote:
lesson17lesson17 Wrote:
overture2112 Wrote:Not out of the box, but if you had a Chinese equivalent of `mecab` (the Japanese morphological analysis tool that MorphMan uses) then it could probably be adapted.
What if the Chinese sentences were pre-segmented?
You'd lose out on the part-of-speech stuff, but otherwise it could work. Try looking at morphemes.py and rewriting `getMorphemes`.
I seem to have got Morph Man working with Chinese(including pos taggin) by following your recommendation. Using jieba (https://github.com/fxsjy/jieba#jieba-1) I changed the getMorphemems function in morphemes.py as follows:

Code:
@memoize
def getMorphemes( e, ws=None, bs=None ): # Str -> PosWhiteList? -> PosBlackList? -> IO [Morpheme]
    e = u''.join(re.findall( ur'[\u4e00-\u9fffa-zA-Z0-9]+', e)) # remove all punctuation
    ms = [ Morpheme( m.word, u'N/A', m.flag, u'N/A', u'N/A') for m in posseg.cut(e) ] # find morphemes using jieba's POS segmenter
    return ms
It seems to work if you import python regex and jieba's posseg. I have tested with my current decks and the Chinese core deck and it seems to be working.
Edited: 2014-02-13, 12:28 am
Reply
lesson17lesson17 Wrote:I have tested with my current decks and the Chinese core deck and it seems to be working.
Awesome. If I ever start studying new languages in the future, perhaps I'll work on improving support for using MorphMan with different languages. In the meantime, feel free to report any issues you run into.
Reply
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions! - Sign up here
JapanesePod101
I was reading through the wiki and noticed this line:

(Note that the Japanese Support plugin must also be installed for MorphMan 3 to work)

Why is the plug-in required? (I'm asking to make sure I'm not missing something with Chinese support)
Reply
This plugin is awesome, thanks for making it!

I have a question, how it works the option 'next new card feature' ? I have it as True, but my cards with "due 999" still continue showing.
Edited: 2014-03-18, 1:23 pm
Reply
I guess I'm done with this add-on, I was pretty impressed by the capabilities of MorphMan; spend a couple of days trying to figure it out, still not working:

Traceback (most recent call last):
File "C:\Documents and Settings\Acer\My Documents\Anki\addons\morphman.py", line 6, in onMorphManRecalc
morph.main.main()
File "C:\Documents and Settings\Acer\My Documents\Anki\addons\morph\main.py", line 199, in main
allDb = mkAllDb( cur )
File "C:\Documents and Settings\Acer\My Documents\Anki\addons\morph\main.py", line 57, in mkAllDb
ms = getMorphemes( fieldValue )
File "C:\Documents and Settings\Acer\My Documents\Anki\addons\morph\util.py", line 119, in __call__
value = self.func(*args)
File "C:\Documents and Settings\Acer\My Documents\Anki\addons\morph\morphemes.py", line 82, in getMorphemes
ms = [ tuple( m.split('\t') ) for m in interact( e ).split('\r') ] # morphemes
File "C:\Documents and Settings\Acer\My Documents\Anki\addons\morph\util.py", line 119, in __call__
value = self.func(*args)
File "C:\Documents and Settings\Acer\My Documents\Anki\addons\morph\morphemes.py", line 66, in interact
p = mecab()
File "C:\Documents and Settings\Acer\My Documents\Anki\addons\morph\util.py", line 119, in __call__
value = self.func(*args)
File "C:\Documents and Settings\Acer\My Documents\Anki\addons\morph\morphemes.py", line 59, in mecab
if not MECAB_ENCODING: MECAB_ENCODING = getMecabEncoding()
File "C:\Documents and Settings\Acer\My Documents\Anki\addons\morph\morphemes.py", line 54, in getMecabEncoding
return runMecabCmd( [ '-D' ] ).stdout.readlines()[2].lstrip( 'charset:' ).lstrip()
IndexError: list index out of range

I'm sure I've got everything right in the config file, but it is producing no results at all either when I do a recalc or hit the view morphemes button, I'm totally lost now.
PLEASE HELP ME...
Reply
lesson17lesson17: If you think you made it work, you should think about forking the project on Github and posting it as a Chinese add-on for Anki.

Also, a few pages back someone said that this error comes up because I didn't define note types in the config.py:

[...]line 175, in updateNotes
TAG.register( tagNames )
UnboundLocalError: local variable 'tagNames' referenced before assignment

However, I only use the Japanese note type with a few extra fields, so I don't see why the default doesn't work:

'morph_fields': [u'Expression'],

I switched to OS X and another plugin has stopped working since. I don't know how that could be related though. Can you help me out?


PS: I might want to shoot this project some Dogecoin if it works. Maybe you can paste an address into your Github and Anki Plugin page Smile
Edited: 2014-05-07, 8:16 pm
Reply
I set up my config.py to use the note types that I want but I keep getting this error. It's pretty similar to JusBlueboy's error up until the end. Any ideas?

Traceback (most recent call last):
File "/Users/seandinger/Documents/Anki/addons/morphman.py", line 6, in onMorphManRecalc
morph.main.main()
File "/Users/seandinger/Documents/Anki/addons/morph/main.py", line 199, in main
allDb = mkAllDb( cur )
File "/Users/seandinger/Documents/Anki/addons/morph/main.py", line 57, in mkAllDb
ms = getMorphemes( fieldValue )
File "/Users/seandinger/Documents/Anki/addons/morph/util.py", line 119, in __call__
value = self.func(*args)
File "/Users/seandinger/Documents/Anki/addons/morph/morphemes.py", line 82, in getMorphemes
ms = [ tuple( m.split('\t') ) for m in interact( e ).split('\r') ] # morphemes
File "/Users/seandinger/Documents/Anki/addons/morph/util.py", line 119, in __call__
value = self.func(*args)
File "/Users/seandinger/Documents/Anki/addons/morph/morphemes.py", line 66, in interact
p = mecab()
File "/Users/seandinger/Documents/Anki/addons/morph/util.py", line 119, in __call__
value = self.func(*args)
File "/Users/seandinger/Documents/Anki/addons/morph/morphemes.py", line 59, in mecab
if not MECAB_ENCODING: MECAB_ENCODING = getMecabEncoding()
File "/Users/seandinger/Documents/Anki/addons/morph/morphemes.py", line 54, in getMecabEncoding
return runMecabCmd( [ '-D' ] ).stdout.readlines()[2].lstrip( 'charset:' ).lstrip()
File "/Users/seandinger/Documents/Anki/addons/morph/morphemes.py", line 50, in runMecabCmd
s = subprocess.Popen( cmd + args, bufsize=-1, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, startupinfo=si )
File "subprocess.pyc", line 709, in __init__
File "subprocess.pyc", line 1307, in _execute_child
File "subprocess.pyc", line 476, in _eintr_retry_call
OSError: [Errno 22] Invalid argument

Edit: Turns out I hadn't installed the Japanese Support plugin. After that, it was smooth sailing. Great app btw Smile
Edited: 2014-05-29, 8:58 pm
Reply
bug:
each access to card browser (on either 'B' or 'L') is accompanied by a memory leak of 8-10 mb
without morphman, memory consumption occasionally goes back down
with morphman, it only goes up until anki crash
Reply
@overture2112, I'm not sure if this feature is already built in somehow, but I wanted the ability to include some decks as input to MM3 without having them be re-scheduled by MM3, so I added a "readonly" option.

e.g.

model_overrides = {
'subs2srs': { 'enabled':True },
'Japanese': { 'readonly':True },
'k2k Japanese': { 'readonly':True },
}

Then I changed the logic so decks with 'readonly' == True participate in the morpheme input phase but not in the output phase.
Reply
Do any of you know how to add a custom dictionary to Mecab? It would be good to be able to add a list of proper names that Mecab would otherwise mis-interpret.

It seems there's a way to specify custom dictionaries but I don't know the details of how to make a dictionary.
Reply
Hello, I'm having an issue with the UnboundError that others have had without a clear solution. The UnboundError is "local variable 'tagNames' referenced before assignment"

I am using the deck "Genki 1 & 2 Second Edition" to test this plugin, and have changed the note type to Japanese-65563 and it has an Expression field which is the default morph_field in config.py. So, my config.py is vanilla, except for the override below

model_overrides = {
'subs2srs': { 'enabled':True },
'JtMW': { 'enabled':True },
'Genki 1 & 2 Second Edition':{'enabled':True},
}

I have read the wiki and watched the video to see if there was anything else I could do, but I must have missed something.

I am using Anki Version 2.0.28 and MorphMan v3.3.

Thanks in advance.

edit: I noticed in main.py tagNames should get assigned in this bit:
compTag, vocabTag, notReadyTag, alreadyKnownTag = tagNames = C('tag_comprehension'), C('tag_vocab'), C('tag_notReady'), C('tag_alreadyKnown')

So, I wanted to say, I have tagged cards with alreadyKnown in the Browser and via 'k' while studying.
Edited: 2014-07-29, 1:43 am
Reply
vosmiura Wrote:Do any of you know how to add a custom dictionary to Mecab? It would be good to be able to add a list of proper names that Mecab would otherwise mis-interpret.

It seems there's a way to specify custom dictionaries but I don't know the details of how to make a dictionary.
There's a library for Java called kuromoji and they support MeCab and custom dictionaries. It may or may not help you find out how to create custom dictionaries for MeCab, but here's the site: http://www.atilika.org/
Reply
jeffberhow Wrote:edit: I noticed in main.py tagNames should get assigned in this bit:
It doesn't get assigned when that loop is subverted by having decks that aren't enabled. To fix it you can add a single line below the matureDb declaration:

tagNames = []
Reply
jeffberhow Wrote:Hello, I'm having an issue with the UnboundError that others have had without a clear solution. The UnboundError is "local variable 'tagNames' referenced before assignment"

I am using the deck "Genki 1 & 2 Second Edition" to test this plugin, and have changed the note type to Japanese-65563 and it has an Expression field which is the default morph_field in config.py. So, my config.py is vanilla, except for the override below

model_overrides = {
'subs2srs': { 'enabled':True },
'JtMW': { 'enabled':True },
'Genki 1 & 2 Second Edition':{'enabled':True},
}

I have read the wiki and watched the video to see if there was anything else I could do, but I must have missed something.

I am using Anki Version 2.0.28 and MorphMan v3.3.

Thanks in advance.

edit: I noticed in main.py tagNames should get assigned in this bit:
compTag, vocabTag, notReadyTag, alreadyKnownTag = tagNames = C('tag_comprehension'), C('tag_vocab'), C('tag_notReady'), C('tag_alreadyKnown')

So, I wanted to say, I have tagged cards with alreadyKnown in the Browser and via 'k' while studying.
I recall getting this error when I had no enabled decks.

In 'model_overrides' you have to use the card model name rather than the deck name, e.g.

model_overrides = {
'subs2srs': { 'enabled':True },
'JtMW': { 'enabled':True },
'Japanese-65563':{'enabled':True},
}
Reply
Thanks, vosmiura! That was the ticket.
Reply
Hey, I used this app for awhile but I had difficult understanding what the overrides did. I stopped using it because I couldn't figure out how to stop Morphman from trying to pull from data from all my decks(I ended up creating a different profile for them). In short, I just want to use it for one deck and ignore the others.
Reply
Just a quick question, do I have to recalculate after I finish my reviews each day? Or is the initial calculation enough for the entire life of the deck provided I don't add new cards to the deck?

Also is it possible to use morphman filter the adding of bulk cards to a deck?

For example, if I wanted to add the core6k deck or another shared deck to my own self created deck, but I don't want review things that I already know.

So when I add core6k (for example) to my own deck, could I use morphman to tag or delete cards that haven't been reviewed yet, but introduce no new material?
Edited: 2014-10-06, 4:42 am
Reply
Is there an official place where one can request new features etc? Basically I just want to have the ability tag cards that introduce no new morpheme and haven't been learned yet.

This would enable people to add large amounts of cards from subs2srs etc, and easily be able to deleted cards that don't teach new material.
Reply
Is there no real start-up guide for this? Sorry to be another idiot but I really just can't get this thing working... I haven't edited the config file at all, which seems to be listed as Step 1 everywhere, but I don't know if that's necessary for it to work or if that's just customization so you can personalize how the mighty morph works for you. Anyway, whenever I try to run the recalc I get this error--

Quote:An error occurred in an add-on.
Please post on the add-on forum:
https://anki.tenderapp.com/discussions/add-ons

Traceback (most recent call last):
File "C:\Users\3jesse\Documents\Anki\addons\morphman.py", line 6, in onMorphManRecalc
morph.main.main()
File "C:\Users\3jesse\Documents\Anki\addons\morph\main.py", line 208, in main
updateNotes( allDb )
File "C:\Users\3jesse\Documents\Anki\addons\morph\main.py", line 175, in updateNotes
TAG.register( tagNames )
UnboundLocalError: local variable 'tagNames' referenced before assignment
I tried to load all.db as DB A and known.db as DB B and compare them, but nothing came up...I think the files are empty but I don't know? I also made a list of all my mature vocab cards and saved those over known.db, because I was originally reading the documentation for v1 where you had to do that manually...the doc for v1 seemed the most friendly, but it's out of date so that's no good, I guess.

Sorry to be too stupid to use your wonderful program!
Reply
Just a suggestion based on all the posts here but, it seems like there's a need for very basic instructions on how to edit the config file (with a sample). I'm having the same problems with errors because I have no idea how to edit the config file according to my own decks.
Reply
This is my error:

Traceback (most recent call last):
File "C:\Users\USER\Documents\Anki\addons\morphman.py", line 6, in onMorphManRecalc
morph.main.main()
File "C:\Users\USER\Documents\Anki\addons\morph\main.py", line 199, in main
allDb = mkAllDb( cur )
File "C:\Users\USER\Documents\Anki\addons\morph\main.py", line 47, in mkAllDb
fieldValue = getMecabField( fieldName, flds, mid )
File "C:\Users\USER\Documents\Anki\addons\morph\main.py", line 17, in getMecabField
return stripHTML( splitFields( flds )[ idx ] )
TypeError: list indices must be integers, not NoneType

Anyone know what I'm doing wrong?
Reply
And one more question.

How do you get Morphman to ignore certain decks? I have 6 different decks but only one for main use. It seems to get stuck recognizing other decks.

In the meantime I created a new Anki profile called "test" and imported only my main deck into it. I extracted all morphemes from this deck of 25,000 cards. The total morphemes listed in Morpheme Manager is 1996. Can this be right? 1996 morphemes from a deck of 25,000 cards? I don't know much about it, I guess, but it seems kinda low.

Any advice on how to move forward would be great. Thank guys.
Reply
I might be able to help you with your most recent post.
Then again, a video might be more helpful.
Edited: 2014-11-04, 2:47 am
Reply
Aspiring Wrote:I might be able to help you with your most recent post.
Then again, a video might be more helpful.
I have watched this video but I'm afraid I didn't understand it. I think it assumes a working knowledge of Python, which I really don't have.
Reply