Back

cb's Japanese Text Analysis Tool

#18
Hello,

I have just released version 2.0 of cb's Japanese Text Analysis Tool.

Download cb's Japanese Text Analysis Tool v2.0 via Google Code

What Changed?

Added option to choose the parse method for the word frequency report. You may now choose to use MeCab or JParser.

MeCab is a widely used morphological analyzer. However, for certain word conjugations, it will parse words in an undesirable fashion. Example: 挟まれる is parsed as 挟ま and れる instead of 挟む.

JParser is the method used by Translation Aggregator to generate furigana. It has no problem with the above conjugation and has better support for short expressions. The downside is that JParser is much slower than Mecab.

For JParser you have the option of always using the kanji form of a word even if it is normally written in kana. If checked, ともに will be converted to 共に. If unchecked, 共に will be converted to ともに.

==============================================================

I have also released two new tools (see the second post for details):

1) cb's Frequency Report Combiner. Combines two reports generated by cb's Japanese Text Analysis Tool.

2) cb's Frequency Report Diff Tool. Compares two reports generated by cb's Japanese Text Analysis Tool.

[Edit: These tools are now deprecated. I merged their functionality into JTAT v3.0]

==============================================================

Here are the latest Japanese Text Analysis Tool reports for a large number of innocent novels:
Download via MediaFire

Includes word frequency report via Mecab, word frequency report via JParser (new), differences between the Mecab report and JParser report (new), kanji frequency report, and readability report.

cb4960
Edited: 2012-05-27, 1:40 pm
Reply

Messages In This Thread