![]() |
|
cb's Japanese Text Analysis Tool - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: Learning resources (http://forum.koohii.com/forum-9.html) +--- Thread: cb's Japanese Text Analysis Tool (/thread-9459.html) |
cb's Japanese Text Analysis Tool - wareya - 2015-07-02 Using 5.1.0.0: >『それは違う。あなた間違っていますよ』って言ったらどうします? MeCab: 1 それは 7.14285714 85.71428571 副詞,一般,*,* JParser: 1 それは 10.00000000 70.00000000 adv Blacklisting それは removes it from the list without causing it to pick up on それ, of course. cb's Japanese Text Analysis Tool - cb4960 - 2015-07-03 Thanks for the example. Unfortunately kuromoji appears to be a Java library, so I won't be able to integrate it as easily as I may have hoped. There's also the issue of requiring Java to be installed. cb's Japanese Text Analysis Tool - Tamba - 2015-07-03 from their FAQ: Quote:Is Kuromoji available in C or C++? cb's Japanese Text Analysis Tool - cb4960 - 2015-07-03 After some further analysis, I've determined that the issue is not with Mecab, but rather with the user dictionary that I am providing to it. I plan to remove それは and perhaps other entries from the user dictionary in the next release. I will also add an option to disable the user dictionary. cb's Japanese Text Analysis Tool - wareya - 2015-07-03 Thank you for your efforts. cb's Japanese Text Analysis Tool - cb4960 - 2015-07-03 Hello, I have just released version 5.2 of cb's Japanese Text Analysis Tool. Download cb's Japanese Text Analysis Tool v5.2 via SourceForge What Changed? ● Removed entries from the MeCab user dictionary that interfered with MeCab's statistical model. ● Added the Use Enhanced Dictionary option when MeCab is selected. This enables the MeCab user dictionary. cb4960 cb's Japanese Text Analysis Tool - xd1986k - 2015-07-13 Word frequency reports are turning out empty with the newest version. Is this a bug or just me? EDIT Fixed by following the steps detailed by blindbox in this thread. cb's Japanese Text Analysis Tool - jcdietz03 - 2015-07-29 I found a text of the book I want to read and I analyzed it using this tool. How can I use the output from this tool to further my studies? What I did was 1. Copypaste the word frequency report into editpad.org. 2. Scan down the list for unfamiliar words with f >= 10 (my text is a light novel, it's approx. 50,000 "words" long, not sure how many characters). 2a.For unfamiliar words, use Rikaisama to make Anki-importable TSV of unfamiliar words and definitions. 3. Import into Anki. Is that a good method? cb's Japanese Text Analysis Tool - yogert909 - 2015-07-29 There is a tool in the program to compare 2 wordlists. You if you have a wordlist of known words, you could filter out all of the words that you already know so you could automate step 2. There's also an anki add-on that will add definitions to cards so you could automate step 2b. cb's Japanese Text Analysis Tool - xd1986k - 2015-07-30 Or you could use cb's EPWING2ANKI for step 2a. It adds example sentences too so that's a plus. cb's Japanese Text Analysis Tool - cb4960 - 2015-07-31 Hello, I have just released version 5.3 of cb's Japanese Text Analysis Tool. Download cb's Japanese Text Analysis Tool v5.3 via SourceForge What Changed? ● Added some optimizations. ● Added analysis time to the Complete dialog. ● Fixed bug in user-readability report not using the Use Enhanced Dictionary option. ● Added the max_tasks option to settings.txt. ● Now targets .Net 4.5. cb4960 cb's Japanese Text Analysis Tool - xd1986k - 2015-08-02 Getting out of memory exceptions with large files in the new version. [spoiler] See the end of this message for details on invoking just-in-time (JIT) debugging instead of this dialog box. ************** Exception Text ************** System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown. at System.Text.StringBuilder.ToString() at System.IO.StreamReader.ReadToEnd() at JapaneseTextAnalysisTool.Mecab.parseWithExe(String input, Boolean userDic) at JapaneseTextAnalysisTool.Mecab.parseFields(String input, Boolean userDic, CancellationTokenSource cancelTokenSource) at JapaneseTextAnalysisTool.FreqWord.addWithMecab(String text, CancellationTokenSource cancelTokenSource) at JapaneseTextAnalysisTool.FreqWord.addFileText(String text, CancellationTokenSource cancelTokenSource) at JapaneseTextAnalysisTool.FormMain.analyzeFile(FileInfo file) at JapaneseTextAnalysisTool.FormMain.<>c__DisplayClass4.<analyzeFileAsync>b__3() at System.Threading.Tasks.Task.InnerInvoke() at System.Threading.Tasks.Task.Execute() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at JapaneseTextAnalysisTool.FormMain.<callAnalyzeFile>d__0.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.CompilerServices.AsyncMethodBuilderCore.<ThrowAsync>b__4(Object state) ************** Loaded Assemblies ************** mscorlib Assembly Version: 4.0.0.0 Win32 Version: 4.0.30319.34209 built by: FX452RTMGDR CodeBase: file:///C:/Windows/Microsoft.NET/Framework/v4.0.30319/mscorlib.dll ---------------------------------------- JapaneseTextAnalysisTool Assembly Version: 5.3.0.0 Win32 Version: 5.3.0.0 CodeBase: file:///D:/JTAT/JapaneseTextAnalysisTool.exe ---------------------------------------- System.Windows.Forms Assembly Version: 4.0.0.0 Win32 Version: 4.0.30319.34251 built by: FX452RTMGDR CodeBase: file:///C:/Windows/Microsoft.Net/assembly/GAC_MSIL/System.Windows.Forms/v4.0_4.0.0.0__b77a5c561934e089/System.Windows.Forms.dll ---------------------------------------- System.Drawing Assembly Version: 4.0.0.0 Win32 Version: 4.0.30319.34209 built by: FX452RTMGDR CodeBase: file:///C:/Windows/Microsoft.Net/assembly/GAC_MSIL/System.Drawing/v4.0_4.0.0.0__b03f5f7f11d50a3a/System.Drawing.dll ---------------------------------------- System Assembly Version: 4.0.0.0 Win32 Version: 4.0.30319.34238 built by: FX452RTMGDR CodeBase: file:///C:/Windows/Microsoft.Net/assembly/GAC_MSIL/System/v4.0_4.0.0.0__b77a5c561934e089/System.dll ---------------------------------------- System.Configuration Assembly Version: 4.0.0.0 Win32 Version: 4.0.30319.34209 built by: FX452RTMGDR CodeBase: file:///C:/Windows/Microsoft.Net/assembly/GAC_MSIL/System.Configuration/v4.0_4.0.0.0__b03f5f7f11d50a3a/System.Configuration.dll ---------------------------------------- System.Xml Assembly Version: 4.0.0.0 Win32 Version: 4.0.30319.34234 built by: FX452RTMGDR CodeBase: file:///C:/Windows/Microsoft.Net/assembly/GAC_MSIL/System.Xml/v4.0_4.0.0.0__b77a5c561934e089/System.Xml.dll ---------------------------------------- ************** JIT Debugging ************** To enable just-in-time (JIT) debugging, the .config file for this application or computer (machine.config) must have the jitDebugging value set in the system.windows.forms section. The application must also be compiled with debugging enabled. For example: <configuration> <system.windows.forms jitDebugging="true" /> </configuration> When JIT debugging is enabled, any unhandled exception will be sent to the JIT debugger registered on the computer rather than be handled by this dialog box. [/spoiler] On failure the mecab folder has some really large files left over (mecab_in and mecab_out) which aren't deleted. cb's Japanese Text Analysis Tool - cb4960 - 2015-08-02 I just uploaded the 64-bit version, try that one instead. It should allow JTAT access to more memory. Not sure why it's taking up so much memory to begin with though. Analysis of the 5000+ innocent novel set doesn't use more than ~200 MB on my machine. How large is the file that you are analyzing? cb's Japanese Text Analysis Tool - xd1986k - 2015-08-03 Working perfectly now. Thank you. As for the size I was analyzing a 400mb folder. Largest file should've been 40mb. cb's Japanese Text Analysis Tool - jahu00 - 2015-08-16 Some time ago I made a small modification to the app (I think I used the 5.0 version). Basically I added the option to save separate reports for each file when processing directories. I planned to use those reports for making vocab drills based on chapters of web novels (don't ask). My vocab drill app got stuck in a very beta stage, so I guess it's not very usable for anyone but me, but maybe someone will find the mod to Japanese Text Analysis Tool useful. The code and executable can be found here: https://github.com/jahu00/JTATmod I tried contacting cb about this before (through sourceforge), but I'm not sure if he ever got my message. cb's Japanese Text Analysis Tool - cb4960 - 2015-10-03 Hello, I have just released version 5.4 of cb's Japanese Text Analysis Tool. Download cb's Japanese Text Analysis Tool v5.4 via SourceForge What Changed? ● Added the "Frequency Group" and "Frequency Rank" fields to both the Word Frequency Report and Kanji Frequency Report. Frequency Group: All words in the analysis that share the exact same frequency (Field 1) will be assigned to a numbered Frequency Group, with group 1 containing the most common word(s), group 2 containing the next most common word(s), and so on. Frequency Rank: For a given word, the Frequency Rank is the total number of words in the analysis that are more frequent that the given word + 1. For example, if the given word has a Frequency Rank of 500, then there are 499 other words in the analysis that are more frequent than the given word. New Word Frequency Report format: Field 1: Number of times word was encountered Field 2: Word Field 3: Frequency Group Field 4: Frequency Rank Field 5: Percentage (Field 1 / Total number of words) Field 6: Cumulative percentage Field 7: Part-of-speech New Kanji Frequency Report format: Field 1: Number of times kanji was encountered Field 2: Kanji Field 3: Frequency Group Field 4: Frequency Rank Field 5: Percentage (Field 1 / Total number of kanji) Field 6: Cumulative percentage Innocent Novel analysis (Sample_Output_151003.zip) can be found at the link above. cb4960 cb's Japanese Text Analysis Tool - rainmaninjapan - 2015-10-05 You're a beautiful man cb. Danke. cb's Japanese Text Analysis Tool - ryuudou - 2015-10-05 Getting this on Windows 7 when clicking the analyze button: ************** Exception Text ************** System.TypeLoadException: Could not load type 'System.Runtime.CompilerServices.IAsyncStateMachine' from assembly 'mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089'. at JapaneseTextAnalysisTool.FormMain.callAnalyzeFile(FileInfo file) at JapaneseTextAnalysisTool.FormMain.performAnalysis() at System.Windows.Forms.Button.OnMouseUp(MouseEventArgs mevent) at System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks) at System.Windows.Forms.Control.WndProc(Message& m) at System.Windows.Forms.ButtonBase.WndProc(Message& m) at System.Windows.Forms.Button.WndProc(Message& m) at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam) 64bit version of the program. I also can't tell you if that blank report error was ever fixed as I'm not on Windows 8 anymore where that took place. cb's Japanese Text Analysis Tool - cb4960 - 2015-10-05 ryuudou Wrote:Getting this on Windows 7 when clicking the analyze button:Do you have .Net version 4.5 installed? To find .NET Framework versions by viewing the registry (.NET Framework 4.5 and later) cb's Japanese Text Analysis Tool - cb4960 - 2015-10-10 Hello, I have just released version 2.3 of cb's Japanese Frequency List Sorter. Download cb's Japanese Frequency List Sorter via SourceForge What Changed? ● Added "Output entire line" option. ● Added "Append number of times encountered" option. ● Added "Append Frequency group option. ● Added "Append Frequency Rank" option. ● Updated word_freq_report_mecab.txt and kanji_freq_report.txt. cb4960 cb's Japanese Text Analysis Tool - ryuudou - 2015-10-11 cb4960 Wrote:It seems to work now but now I'm getting blank word reports with both parsers. This also happened in Windows 8.ryuudou Wrote:Getting this on Windows 7 when clicking the analyze button:Do you have .Net version 4.5 installed? cb's Japanese Text Analysis Tool - cb4960 - 2015-10-11 ryuudou Wrote:Someday I'll have to add logging to JTAT to get to the bottom of this.cb4960 Wrote:It seems to work now but now I'm getting blank word reports with both parsers. This also happened in Windows 8.ryuudou Wrote:Getting this on Windows 7 when clicking the analyze button:Do you have .Net version 4.5 installed? Another user who was having a similar issue apparently found a workaround: http://forum.koohii.com/showthread.php?pid=220227#pid220227 cb's Japanese Text Analysis Tool - Gensan - 2015-10-31 i have black word report problem. and i cant download 4.4 version. sorry, nvm. no space do the trick. btw, is there a way to blacklist part-of-speech? i want JTAT to ignore 感動詞,助詞 and 助動詞..... RE: cb's Japanese Text Analysis Tool - Zarxrax - 2015-11-30 Anyone know a tool or some code I can use to take the word frequency report, and limit it to only words which contain kanji? RE: cb's Japanese Text Analysis Tool - yogert909 - 2015-11-30 Zarxrax Wrote:Anyone know a tool or some code I can use to take the word frequency report, and limit it to only words which contain kanji?If you open the report in a text editor which supports regular expressions, search the following, and replace with nothing. Search for: ^((?![\x{4e00}-\x{9faf}]).)*\n Replace with: |