Back

How words get the message across

#1
http://www.nature.com/news/2011/110124/f...11.40.html

"Longer words tend to carry more information, according to research by a team of cognitive scientists... "

Original: Word lengths are optimized for efficient communication

Abstract:

We demonstrate a substantial improvement on one of the most celebrated empirical laws in the study of language, Zipf's 75-y-old theory that word length is primarily determined by frequency of use. In accord with rational theories of communication, we show across 10 languages that average information content is a much better predictor of word length than frequency. This indicates that human lexicons are efficiently structured for communication by taking into account interword statistical dependencies. Lexical systems result from an optimization of communicative pressures, coding meanings efficiently given the complex statistics of natural language use.
Edited: 2011-01-31, 2:19 am
Reply
#2
Your basic information theory says rare words carry the message.

Just like information theory says random bit patterns contain information...and non-random patterns don't.

As a bit of evidence for the theory, consider the 300 most common English words.

What is this measure "average information content?" It doesn't seem to be word length alone.
Reply
#3
jcdietz03: Get an article about a topic, remove the 300 most common English words, give it to a native speaker, they'll still get the gist of the article. Get that same article, remove the long words, then give it to a native speaker, I doubt they'd understand a thing.


Edit: But then of-course we're getting to the question of what exactly *is* information. Is a gist more important than exactly knowing? etc So we're back to square one in linguistic meaning
Edited: 2011-01-31, 8:34 am
Reply
May 16 - 30 : Pretty Big Deal: Save 31% on all Premium Subscriptions! - Sign up here
JapanesePod101
#4
It's not particularly surprising. The most common words are grammar words (a, an, the, for, it, her, him, I, at, in, on, etc.) They occur often as they are needed to glue sentences together. The same is highly likely to be true in Japanese with the exception of the pronouns. It is completely logical to think that anything that isn't a grammar word is going to carry the bulk of the meaning in the sentence. Imagine if I took 'cat' out of "I love cats". You would get I'm talking about something I love, but without 'cat' you haven't got anything. The same wouldn't apply if you remove "I". Hence, the two words that are probably not as common as the the grammar word "I" carry a lot of the meaning. Then again this doesn't justify not learning grammar words, it just tells you that the bulk of your work should be learning non-grammar words, perhaps.
Edited: 2011-01-31, 9:12 am
Reply
#5
Japanese systematically (or maybe not systematically) excludes words from sentences though. As I learner, I don't like it, but I imagine native speakers would be annoyed because including them every time is too much repetition.

[Person's Name]は? = Where is [person]? Excluded is どこですか

One example I heard is なぜここに!? To mean "Why are you here!?" いる and "you" have been excluded.

As a learner, when words are excluded, it's hard for me to guess what word should have been there.
Reply
#6
jcdietz03 Wrote:Your basic information theory says rare words carry the message.

Just like information theory says random bit patterns contain information...and non-random patterns don't.

As a bit of evidence for the theory, consider the 300 most common English words.

What is this measure "average information content?" It doesn't seem to be word length alone.
Information theory doesn't say rare words carry the message. It seems you're conflating multiple ideas from information theory with regards to information and semantics, and paralleling it with a language's most common or infrequent words overall.

Previously, with regards to semantic information and word frequency/distribution: http://forum.koohii.com/showthread.php?tid=3792
Reply
#7
Interesting article, I wonder how it applies to a monosyllabic language like Hmong...
Reply