![]() |
|
Comparing kanji lists - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: General discussion (http://forum.koohii.com/forum-8.html) +--- Thread: Comparing kanji lists (/thread-10346.html) |
Comparing kanji lists - thecite - 2013-01-03 I've got two big lists of kanji in separate word documents, I want to compare the data and see which kanji from list #1 aren't in list #2. Does anyone know how I'd go about doing this? Any help would be appreciated, thanks. Comparing kanji lists - Mushi - 2013-01-03 Could you copy the kanji only from the two sources to two text files with one kanji on each line, sort them, then do a line by line comparison? It depends on your OS, but on Windows, for example, I believe you could do something like save them in Notepad in UTF8 text format, then sort each text file with sort.exe, then diff them textually via fc.exe (file compare) or via some free text comparison tool like BeyondCompare. When doing things similar to this before, I've found that only minor issue to keep in mind is to remember to preserve the little UTF-8 file marker at the beginning of the text files when managing the output... Comparing kanji lists - Mushi - 2013-01-03 Or, it occurred to me that since you're using Word, you probably also have Excel, which may be easier for you to use. If so, you may want to search Excel help for the various was you can do this, for example, http://office.microsoft.com/en-us/excel-help/use-excel-to-compare-two-lists-of-data-HA001103915.aspx. Comparing kanji lists - frony0 - 2013-01-03 Learn SQL! (I'd tell you how, but I've forgotten exactly how myself...) Comparing kanji lists - thecite - 2013-01-03 Thanks for your help! Keep in mind that there's likely 1500+ characters that aren't in list #2, so any manual compilation would be an extreme hassle. Anyway, I've got the UTF8 .txt files, but how do I run them through sort.exe? Googled it, but couldn't find any good explanations. Comparing kanji lists - Katsuo - 2013-01-03 If you have an Apple computer then here's a simple AppleScript to do that using the TextEdit word processor application. The script will go through each character of a TextEdit document (named docOne.rtf) and see if it is present anywhere in another document (named docTwo.rtf). It will make a list of all those that are not in docTwo. (First set up the two documents then paste everything below this paragraph into the AppleScript Editor and click "Run". The result will appear in the bottom pane. If there are thousands of characters it may take a few minutes.) tell application "TextEdit" set totalList to "" set text1 to text of document "docOne.rtf" set text2 to text of document "docTwo.rtf" set totalChar to count of characters of text1 repeat with characX from 1 to totalChar set nexChar to character characX of text1 if nexChar is not in text2 then set totalList to totalList & nexChar end repeat say "I have finished" totalList end tell Comparing kanji lists - thecite - 2013-01-03 Thanks Katsuo, that sounds like exactly what I'm looking for. Do the two documents have to be saved to any particular location? Edit: Worked like a charm, thank you. Comparing kanji lists - lauri_ranta - 2013-01-03 On OS X you could also save the lists to plain text files and run this in Terminal: comm -23 <(sort 1.txt) <(sort 2.txt) Another option using Ruby: ruby -e 'puts File.readlines("1.txt") - File.readlines("2.txt")' |