That is, I have a sentences deck. I would like to take from that deck all of the sentences that I know, and break them up, word by word, so that I can use cb's frequency report generator to compare them with my Japanese ebooks. Essentially, so I can figure out how readable a novel is before I dive into it.
The problem is, cb's tool needs each word to be on its own line; for some reason, even though it seems to do so for frequency analysis, the readability analysis tool does not automatically break apart sentences into words.
So essentially what I'm asking for is a tool that can parse out words from a list of sentences, and spit out a (non-repeating) list of those words. I'm a programmer, so I can make it myself if anything, but I'd hate to repeat the work if someone else has already done it.
The problem is, cb's tool needs each word to be on its own line; for some reason, even though it seems to do so for frequency analysis, the readability analysis tool does not automatically break apart sentences into words.
So essentially what I'm asking for is a tool that can parse out words from a list of sentences, and spit out a (non-repeating) list of those words. I'm a programmer, so I can make it myself if anything, but I'd hate to repeat the work if someone else has already done it.
