So there's a lot of tools out there for Japanese linguistics floating about, regarding quantitative text analysis, corpus linguistics, etc. It's a still-developing area, but I've been browsing around finding new toys and publications on the topic.
For publications, check out:
Japanese word sketches: towards a new version
A large public-access corpus for Japanese
A web corpus and word sketches for Japanese
For toys we have:
Japanese Wordnet - What's cool is the hypernyms, e.g. if you enter a word that's a hyponym (e.g. 白), clicking on its entry, in addition to Japanese + English results it will tell you its hyponym and hypernym (e.g. an achromatic colour, which is a hyponym of colour). The online version also has pictures (e.g. 車 has a tiny car icon). I haven't played with it much or at all offline, but I think there's potential there for its structure.
There's also KH Coder, a tool for content analysis, that has tonnes of features. Menus are in Japanese. I've been playing with the KWIC concordancing and its collocation sorting for word search results. There's also lots of stuff for graphing, statistics, parts-of-speech analysis, and sentence decomposition/parsing. Still playing with it. There's a lot of other programs out there for Japanese, but I couldn't find anything that worked well/at all.
For an online but more limited concordancer using tools/data provided by the authors of the above publication links: http://www.someya-net.com/concordancer/index_j.html
There's also a lot of links via the KH Coder page, here:
http://khc.sourceforge.net/
Or rather: http://khc.sourceforge.net/link.html
Edit: Oh! Just as I wrote that I couldn't get others to work (re: Free Japanese collocations), I got Laurence Anthony's AntConc to work: http://www.antlab.sci.waseda.ac.jp/software.html
Thanks to Thora for that last. I tried it before and couldn't get it work, but now it's fine (user error corrected).
Last edited by nest0r (2011 May 24, 10:29 pm)
That's interesting. I've been reading about different ways of using semantic categories for word acquisition, though I've lost track of what papers interested me, but the idea of using Japanese WordNet to cluster words semantically struck me, with regards to hypernyms and such. Might refine it using some ideas from here: http://findarticles.com/p/articles/mi_7 … _n57103270
Meanwhile I'm still working out collocations in KH Coder and AntConc. The latter features regex and batch query support, so that could be fun. Edit: Ah! Clusters.
Edit: Oh, I see what you mean, re: Japanese WordNet at WWWJDIC. The JW links in search results. It's apparently used at Weblio also.
Last edited by nest0r (2011 May 25, 10:28 am)