![]() |
|
New toys for Japanese text analysis - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: Learning resources (http://forum.koohii.com/forum-9.html) +--- Thread: New toys for Japanese text analysis (/thread-7876.html) |
New toys for Japanese text analysis - nest0r - 2011-05-24 So there's a lot of tools out there for Japanese linguistics floating about, regarding quantitative text analysis, corpus linguistics, etc. It's a still-developing area, but I've been browsing around finding new toys and publications on the topic. For publications, check out: Japanese word sketches: towards a new version A large public-access corpus for Japanese A web corpus and word sketches for Japanese For toys we have: Japanese Wordnet - What's cool is the hypernyms, e.g. if you enter a word that's a hyponym (e.g. 白), clicking on its entry, in addition to Japanese + English results it will tell you its hyponym and hypernym (e.g. an achromatic colour, which is a hyponym of colour). The online version also has pictures (e.g. 車 has a tiny car icon). I haven't played with it much or at all offline, but I think there's potential there for its structure. There's also KH Coder, a tool for content analysis, that has tonnes of features. Menus are in Japanese. I've been playing with the KWIC concordancing and its collocation sorting for word search results. There's also lots of stuff for graphing, statistics, parts-of-speech analysis, and sentence decomposition/parsing. Still playing with it. There's a lot of other programs out there for Japanese, but I couldn't find anything that worked well/at all. For an online but more limited concordancer using tools/data provided by the authors of the above publication links: http://www.someya-net.com/concordancer/index_j.html There's also a lot of links via the KH Coder page, here: http://khc.sourceforge.net/ Or rather: http://khc.sourceforge.net/link.html Edit: Oh! Just as I wrote that I couldn't get others to work (re: Free Japanese collocations), I got Laurence Anthony's AntConc to work: http://www.antlab.sci.waseda.ac.jp/software.html Thanks to Thora for that last. I tried it before and couldn't get it work, but now it's fine (user error corrected). New toys for Japanese text analysis - jettyke - 2011-05-24 lol @ the title New toys for Japanese text analysis - vix86 - 2011-05-25 Ya I recently found the Japanese Wordnet too. I had known about the Princeton one for English but hadn't realized till a few days ago (when I saw in the history log that Breen had included Wordnet into the WWWJDIC) that they had translated it entirely into Japanese (so to speak). Upon finding it, my original idea for a reverse dictionary look up tool for Japanese might finally be possible. I've always had an issue where sometimes I know part of a word I want and the general semantic meaning to the word but no way to look it up. So having a way to dump the partial fragment and then select a group of semantically similar words to the target and have it list candidates; would be awesome to say the least. New toys for Japanese text analysis - nest0r - 2011-05-25 That's interesting. I've been reading about different ways of using semantic categories for word acquisition, though I've lost track of what papers interested me, but the idea of using Japanese WordNet to cluster words semantically struck me, with regards to hypernyms and such. Might refine it using some ideas from here: http://findarticles.com/p/articles/mi_7100/is_1_14/ai_n57103270 Meanwhile I'm still working out collocations in KH Coder and AntConc. The latter features regex and batch query support, so that could be fun. Edit: Ah! Clusters. Edit: Oh, I see what you mean, re: Japanese WordNet at WWWJDIC. The JW links in search results. It's apparently used at Weblio also. New toys for Japanese text analysis - nest0r - 2011-05-25 Another cool tool (found via Japanese WordNet's link to the concept dictionary from same site): http://langrid.org/playground/dependency-parser.html Also: http://reed.kuee.kyoto-u.ac.jp/nl-resource/knp-form.html It's a cool site overall but the concept dictionary takes forever for me. It might be what you're looking for though, vix86. Related: http://www.tkl.iis.u-tokyo.ac.jp/~ynaga/jdepp/ New toys for Japanese text analysis - vix86 - 2011-05-25 I'm still trying to figure out what a condcordancer is. Also, aren't the dependency-parsers pretty much doing exactly what MeCab does? The thing used in the Japanese support plugin to help split sentences into parts. New toys for Japanese text analysis - nest0r - 2011-05-25 vix86 Wrote:I'm still trying to figure out what a condcordancer is.http://en.wikipedia.org/wiki/Concordancer For the second bit, no, it's showing you the dependency relations of words/strings, but it does first use morphological analyzers like Mecab (e.g. Juman or Chasen for KNP and CaboCha, respectively): For a more detailed explanation of KNP: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.44.6458&rep=rep1&type=pdf See also: http://en.wikipedia.org/wiki/Dependency_grammar Edit: More: http://jones.ling.indiana.edu/~mdickinson/08/614/slides/dg-slides-2x3.pdf AntConc is really easy to use (for Japanese, make sure to go into Global→Language settings and select UTF-8 or Shift_JIS, &c.), so you could play with that to get a feel for what these tools do. New toys for Japanese text analysis - toshiromiballza - 2013-01-31 http://www2.mr.hum.titech.ac.jp/~bor/posts/HinokiDraft.html http://hinoki.ryu.titech.ac.jp/asunaro/main.php?lang=en (edit: seems like this is broken?) http://hinoki.ryu.titech.ac.jp/natsume/ http://hinoki.ryu.titech.ac.jp/natane http://hinoki.ryu.titech.ac.jp/nutmeg/ http://cl.naist.jp/chantokun/ http://www.pawel.jp/outline_of_tools/tomarigi/ http://www.pawel.jp/download/tomarigi/ |