looking for software to compare texts

Index » The Japanese language

  • 1
 
Reply #1 - 2010 April 17, 8:25 pm
clemente Member
From: venexia Registered: 2008-11-06 Posts: 22

Hello everyone,

I was wondering if anyone knew how to compare various Japanese texts and see if there are matches in the sentences (and obviously see what the matches are).
I have an archive of texts (around 100) and I wanted to compare each one of them against the others to see if there were sentences that were exactly the same (it would be nice to be able to set some parameters such as minimum length, exact match or similar etc.).
I have found this piece of software, called Corsis (formerly Tenka text), but the documentation is not yet full and I am not able to use it for my purpose.
I also tried with the anti copycatting software but they generally don't handle Japanese characters or are too expensive.
Thanks for any help and suggestion.
Cheers.

Reply #2 - 2010 April 17, 9:22 pm
FooSoft Member
From: Seattle, WA Registered: 2009-02-15 Posts: 513 Website

That sounds like a pretty specific task, I think the best bet would be to use a scripting language (python/ruby/perl). Then you would create some regular expressions for matching the types of text you want (whole sentences, expressions, whatever), processing the results as needed. Then you unleash the script on the globs of input.

If you are at all technically minded, you should try your hand at scripting. It's not hard and allows you to harness the power of your PC for doing repetitive tasks in a very fast manner smile

Last edited by FooSoft (2010 April 17, 11:42 pm)

nest0r Member
Registered: 2007-10-19 Posts: 5236 Website

UltraEdit lets you compare at least two texts, not sure how many more, and not sure how it works. I've used Ultraedit for years, but mostly out of habit, as at the time I began using it, it was one of the few editors I knew of that allowed texts to be opened in multiple tabs. In fact, I think it might've been the only one? Can't remember that far back.

Last edited by nest0r (2010 April 17, 10:55 pm)

Advertising (register and sign in to hide this)
JapanesePod101 Sponsor
 
Reply #4 - 2010 April 18, 2:10 am
Jarvik7 Member
From: 名古屋 Registered: 2007-03-05 Posts: 3946

BBEdit on OSX or Textpad on Windows should be able to do it using a combination of search & replace (each new sentence on a new line) and comparing.

Reply #5 - 2010 April 18, 8:34 pm
clemente Member
From: venexia Registered: 2008-11-06 Posts: 22

Thank you all for the kind replies.
Unfortunately I am not yet able to write scripts, although I hope to soon have some time to learn. Do you know if anyone has ever done anything like this?
As for the other software, it works quite well, but I have more than a hundred files and checking one by one against the others would take really a long time.
Cheers

Reply #6 - 2010 April 18, 8:46 pm
nest0r Member
Registered: 2007-10-19 Posts: 5236 Website

Rereading your initial post, if all you want to do is find instances of words/phrases, just use UltraEdit's "Find in Files". Lots of options there and it lists the results by line in a bottom pane (or a new tab).

Pretty stupid of me, because I asked this here: http://forum.koohii.com/viewtopic.php?pid=93929#p93929 - And I thought I had to make do with Word, but it's much better in UltraEdit to actually see the results. And even though I already knew I could do this, it was like a brain glitch because I didn't put the ideas together till just now.

Edit: Okay, with new UE it works for Shift-JIS and other encodings, fantastic!

Last edited by nest0r (2010 April 18, 10:27 pm)

  • 1