![]() |
|
The Use of Film Subtitles to Estimate Word Frequencies - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: Off topic (http://forum.koohii.com/forum-13.html) +--- Thread: The Use of Film Subtitles to Estimate Word Frequencies (/thread-5346.html) |
The Use of Film Subtitles to Estimate Word Frequencies - nest0r - 2010-04-03 This is especially interesting to me, in light of the comparison of scripted dialogue to 'unscripted' spoken corpora. The Use of Film Subtitles to Estimate Word Frequencies Abstract: We examine the use of film subtitles as an approximation of word frequencies in human interactions. Because subtitle files are widely available on the Internet, they may present a fast and easy way to obtain word frequency measures in language registers other than text writing. We compiled a corpus of 52 million French words, coming from a variety of films. Frequency measures based on this corpus compared well to other spoken and written frequency measures, and explained variance in lexical decision times in addition to what is accounted for by the available French written frequency measures. The Use of Film Subtitles to Estimate Word Frequencies - Asriel - 2010-04-03 nest0r, you know I love the effort you put into research, and it truly is amazing. Perhaps you could set up a website/blog? Perhaps you already have one... Anyway, I had thought of something similar to this a few days ago, when I realized how easy it is to rip subtitles from DVDs. Unfortunately, that results in idx/sub files, which is hard to effectively pull usable text from. I'm thinking that something like this could be done relatively simply, even just using the subtitles over at d-addicts, etc. Especially for someone more talented at coding than me. Set up Japanese parser*, run the code through, and have numbers. edit: *I don't remember the name of that Japanese Parser, but it's not Mezbup, because that's a member here. It's like..Mecbap, Mecchap, Hubcap, or something along those lines... The Use of Film Subtitles to Estimate Word Frequencies - mezbup - 2010-04-03 Asriel Wrote:set up Mezbupできた! The Use of Film Subtitles to Estimate Word Frequencies - nest0r - 2010-04-04 My posts in this forum = nest0r's blog. ;p FooSoft has a nice site setup, even did some light novel analysis. Can't remember if they've tinkered with subs or not, but yes, there's so much potential out there for multimedia corpora/generating resources from native materials to keep some folks for falling into the binary trap (learning materials vs. native, etc.). The Use of Film Subtitles to Estimate Word Frequencies - iSoron - 2010-04-04 Asriel Wrote:edit: *I don't remember the name of that Japanese Parser, but it's not Mezbup, because that's a member here. It's like..Mecbap, Mecchap, Hubcap, or something along those lines...MeCab. The Use of Film Subtitles to Estimate Word Frequencies - Nemotoad - 2010-04-04 How did I miss this? This thread is full of kittens and frogs! Nest0r, once again, great link, but you forgot to give me thinly veiled hint to this. Do you know if there's an available frequency list from this corpus study?
The Use of Film Subtitles to Estimate Word Frequencies - nest0r - 2010-04-04 I don't know, I didn't look too hard, because I was thinking they might be worried about copyright? Even though at that point it would just be words, perhaps they were iffy about having the paper and the list so close together. But Boris New participated, and they seem to also be involved with Lexique, so. The Use of Film Subtitles to Estimate Word Frequencies - nest0r - 2010-04-04 Actually the newest Lexique version seems to have them. You can download it free: http://www.lexique.org/telLexique.php |