vix86 Wrote:Well the webster dictionary has about 500,000 entries but if you google around some people claim that English has about 1 million words, but that probably includes some truly archaic words and inflections. By comparison the 大辞林 has 255,000 entries and that includes some old words (古代) that probably aren't used any more.
Here's one article that digs into those ~1mil words claims, and further proceeds to demonstrate how unreliable and arbitrary those word counts can get even within one language, where you don't have to deal with
wildly differing morphologies, language policies and plain old prejudices (Académie française vis-a-vis Anglicisms, Russian dictionary makers vs. our highly productive profanities, etc.).
vix86 Wrote:Then there is this but even they state it about the same way I did, I didn't say "English has the most vocabulary" I said "English probably has the biggest vocabulary."
I think "it seems
quite probable that English has
more words than most
comparable world languages" is a lot more modest a claim than "English probably has the biggest vocabulary", especially considering that the end of the article acknowledges the wild differences that exist between morphologies of the world's languages - differences great enough to make vocabulary size comparisons on a global scale
utterly meaningless.
vix86 Wrote:English is the result of many merges with other languages over the centuries.
Personally I think that this point tends to get blown out of proportion. I mean, yeah, it may be unique among the Germanic languages (the question of the continued existence of a separate
Scots language notwithstanding) and very different from Romance languages in this regard, but the sort of mass borrowing that was triggered by the Norman invasion is hardly a unique event globally - not when there are languages around with
only 20% native vocabulary. Heck, Japanese itself is a language where the majority of the vocabulary is borrowed, words with the same basic meanings coming from different source languages get used differently depending on register and context, and the powers that be aren't terribly interested in limiting the influx of loanwords.