One of the more common questions that arrive for the Q&A section asks how many words there are in the English language. Almost as common are requests for the average size of a person’s vocabulary. These sound like easy questions; I have to tell you that they’re indeed easy to ask. But they’re almost impossible to answer satisfactorily, because it all depends what you mean by word and by vocabulary (or even English).
What we mean by word sounds obvious, but it’s not. Take a verb like climb. The rules of English allow you to generate the forms climbs, climbed, climbable, and climbing, the nouns climb and climber (and their plurals climbs and climbers), compounds such as climb-down and climbing frame, and phrasal verbs like climb on, climb over, and climb down. Now, here’s the question you’ve got to answer: are all these distinct words, or do you lump them all together under climb?
That this is not a trivial question can be proved by looking at half a dozen current dictionaries. You won’t find two that agree on what to list. Almost every word in the language has this fuzzy penumbra of inflected forms, separate senses and compounds, some to a much greater extent than climb. To take a famous case, the entry for set in the Oxford English Dictionary runs to 60,000 words. The noun alone has 47 separate senses listed. Are all these distinct words?
And in a wider sense, what do you include in your list of words? Do you count all the regional variations of English? Or slang? Dialect? Family or private language? Proper names and the names of places? And what about abbreviations? The biggest dictionary of them has more than 400,000 entries — do you count them all as words? And what about informal and formal names for living things? The wood louse is known in Britain by many local names — tiggy-hog, cheeselog, pill bug, chiggy pig, and rolypoly among others. Are these all to be counted as separate words? And, to take a more specialist example, is Saccharomyces cerevisiae, the formal name for bread yeast, to be counted as a word (or perhaps two)? If you say yes, you’ve got to add another couple of million such names to the English-language word count. And what about medical terms, such as syncytiotrophoblastic or holoprosencephaly, that few of us ever encounter?
The other difficult term is vocabulary. What counts as a word that somebody knows? Is it one that a person uses regularly and accurately? Or perhaps one that will be correctly recognised — say in written text — but not used? Or perhaps one that will be understood in context but which the person may not easily be able to define? This distinction between what linguists call active and passive vocabularies is hard to measure, and it skews estimates.
The problem doesn’t stop there. English speakers not only know words, they know word-forming elements, such as the ending -phobia for some irrational fear. A journalist rushing to meet a deadline might take a word he knows, like Serb, and tack on the ending to make Serbophobia. He’s just added a word to the language (probably only temporarily), but can he really be said to have that word in his vocabulary? If nobody ever uses it again, can we legitimately count it? By reversing the coining process, a reader of the newspaper can easily work out the word’s origin and meaning. Has the reader also added a word to his vocabulary?
Can you now see why estimates of the total number of words in the English language and in a person’s vocabulary are so difficult to make, and why they vary so much one from another? David Crystal, in the Cambridge Encyclopedia of the English Language, suggests that there must be at least a million words in the language. Tom McArthur, in the Oxford Companion to the English Language, comes up with a similar figure. David Crystal further says that if you allow all scientific terms the total could easily reach two million (this doesn’t count the formal names for organisms I spoke about earlier, just technical vocabulary).
Assessing the size of the vocabulary of an individual is at least as problematical. Take Shakespeare: you’d think it would be easy to assess his vocabulary. We have the plays and sonnets and we just have to count the words in them (according to the American Heritage Dictionary, there are 884,647 of them, made up of 29,066 distinct forms, including proper names). But estimates of Shakespeare’s vocabulary vary from about 18,000 to 25,000 in various books, because writers have different views about what constitutes a distinct word.
It’s common to see figures for vocabulary quoted such as 10,000-12,000 words for a 16-year-old, and 20,000-25,000 for a college graduate. These seem not to have much research to back them up. Usually they don’t make clear whether active or passive vocabulary is being quoted, and they don’t account for differences in lifestyle, profession and hobby interests between individuals.
David Crystal described a simple research project — using random pages from a dictionary — that suggests these figures are severe underestimates. He concludes that a better average for a college graduate might be 60,000 active words and 75,000 passive ones. But this method of assessing vocabulary counts dictionary headwords only; it would be possible to multiply it several-fold to include different senses, inflected forms, and compounds. Another assessment — of a million-word collection of American texts — identified about 38,000 headwords. Bearing in mind this was all general writing, this doesn’t sound so different from David Crystal’s estimates for graduate vocabularies.