How I compare them:
I usually play with the data structures in a python interpreter. In this case I just created a list of all the expressions, calculated the length, and then converted it to a set (thus remove duplicates) and calculated the length again. Should be correct, albeit extremely strict (ie, a sentence with an extra space or punctuation would be considered different- it's also case sensitive).
I usually play with the data structures in a python interpreter. In this case I just created a list of all the expressions, calculated the length, and then converted it to a set (thus remove duplicates) and calculated the length again. Should be correct, albeit extremely strict (ie, a sentence with an extra space or punctuation would be considered different- it's also case sensitive).
Code:
>>> from mach4 import *
[ snipped output for brevity ]
>>> len( [ d[i]['sentExpr'] for i in d ] )
9669
>>> len(set( [ d[i]['sentExpr'] for i in d ] ))
9604
>>>
Edited: 2011-01-30, 1:48 am
