overture2112 Wrote:I seem to have got Morph Man working with Chinese(including pos taggin) by following your recommendation. Using jieba (https://github.com/fxsjy/jieba#jieba-1) I changed the getMorphemems function in morphemes.py as follows:lesson17lesson17 Wrote:You'd lose out on the part-of-speech stuff, but otherwise it could work. Try looking at morphemes.py and rewriting `getMorphemes`.overture2112 Wrote:Not out of the box, but if you had a Chinese equivalent of `mecab` (the Japanese morphological analysis tool that MorphMan uses) then it could probably be adapted.What if the Chinese sentences were pre-segmented?
Code:
@memoize
def getMorphemes( e, ws=None, bs=None ): # Str -> PosWhiteList? -> PosBlackList? -> IO [Morpheme]
e = u''.join(re.findall( ur'[\u4e00-\u9fffa-zA-Z0-9]+', e)) # remove all punctuation
ms = [ Morpheme( m.word, u'N/A', m.flag, u'N/A', u'N/A') for m in posseg.cut(e) ] # find morphemes using jieba's POS segmenter
return ms
Edited: 2014-02-13, 12:28 am
