13 April 2012
Abstract :  What is an idiom? The answer is both complex and fuzzy. This short paper is a colloquial discussion that begins with a student inquiry about learning idioms and progresses to the realization that idioms are an indeterminate category which raise deep questions about the nature of collocation and cognitive language processing.

She was trying to be conversational. "Tell me some good idioms", she said. I've been guilty of reversing this as a learner of Chinese. "Tell me the best 成语 (chengyu) to learn", I'll demand of some startled Chinese accountancy student or engineer. We should all know better. Not many of us can manage the turn-on-a-pinhead instant wit of an Oscar Wilde, and how can we find a fit for those expectant ears in another head? So we mumble the first thing that comes to mind from tens of thousands of English idioms or Chinese 成语.  

This time I stonewalled the lady, not planning to be rude, just a kind of subconscious rebellion. "Some good idioms?" For what, when, how, why and under what level of threat?  She was asking for a bucket of seawater to explain the world’s ocean currents. That’s the trouble with knowing too much about a subject for a 30 second sound bite (no arrogance intended). She went off in a huff, looking for her soundbite. After all, for a real answer there was always Google online, with zillions of sources and lists. The trouble is, they wouldn’t help her much unless she had a photographic memory, and even then not much either. Why not?

The simple answer is that so-called idioms flit in and out of fashion in the endlessly shifting galaxies of location, circumstance, social group, personality ... and so on. I actually see and hear new expressions every day, even after 66 years, or old expressions repackaged. The hard answer is another question: "what the hell is an idiom anyway?" To that there is no easy answer.

“Learning idioms”, classroom style, is not just a problem of memory or context. The killer is that in any language it is not the case that there are just two kinds of expression, a) sentences, b) idioms. It is not like water and ice. It is more like the colour spectrum of the rainbow – the colours shade into each other across the whole spectrum of frequencies.

Related to the fuzzy question of what makes an idiom an idiom is another shocking reality, at least for language learners. The “grammar rules” you learn from text books are NOT a simple recipe for making English or Chinese or any other language. If you memorized all the grammar rules in the best text book and all the words in the biggest dictionary, even with good cultural knowledge you would still not be able to speak or to write. It is a sad fact that hardly any teachers understand that truth (let alone their students, or school administrators). In fact, very few academics acknowledge it either, whatever their private beliefs. You might just as well ask a roomful of priests whether they doubt the existence of their god. Such sanctified truths were a big reason I eventually walked away from my second PhD candidacy. That PhD topic was exactly on the question of what an idiom might be.

So do I have a better answer? In a way yes, but it will hardly fit a doorstop interview soundbite, or even a Facebook post. Here is a dangerously brief hint about why lists of grammar rules and words do not amount to a mental language machine:

The human brain is a parallel processor, not a linear processor like a digital computer. It works with probabilities. Whole sets of probabilities are evaluated simultaneously, with outcomes merged to become input to new probability calculations, and with remarkable speed out of that mess comes a sentence. I read somewhere that an early satellite was lost because of a minor punctuation error in the controlling computer code. That's linear programming: economical, rigidly rule-based, but fragile. Natural language generation is not like that. It has massive redundancy and approximate meanings. Near enough is good enough. But it is robust. Even idiots and savants can talk with each other. With natural language creation, streams of words or word-sets collect like a sketch progressing to an oil painting, progressively governed by many kinds of probabilities. The final word-painting has a slightly different meaning for everyone who encounters it after transmission, and each listener reverse-engineers the word assembly according to probabilities that they have learned.

What does all this stuff mean for your life? What does the untidy reality of language making mean for the language learner? Thinking about these questions might not be important for you. Mostly your clever subconscious brain will get to work and do it all anyway. That is really lucky. If you had to pass an exam on the science behind language, we would still be swinging through the trees like chimpanzees. The business of a linguist though is to figure out just what is going on in your clever brain (not the street usage of "linguist" here, as in "knows many languages", but the technical use, being the scientific study of how languages work).  Maybe you are just a little bit curious about the linguistic question, about how you really make language. Again, in this place I can only give you a hint.

The key term is “collocation”. If you speak the word “Fred”, only a limited number of words in English can follow this word. Each of the words which can follow “Fred” has a certain probability of following. If you “know English”, you know this probability set subconsciously. If you happen to know a guy called “Fred”, and know his personality, the probability of which word will follow his name changes. Let’s say Fred is a wife beater. Then “hit” has a high probability of following the word “Fred”. Also “his” will probably follow “hit”. The more words you speak in that sentence, the higher each probability for the word to follow becomes. In fact, if  you don’t hear one or two words, your brain might kindly put in the high probability “missing words” and you won’t even be aware of it. Later you might swear to a judge with your hand on a stack of bibles that the sentence contained those missing words (but maybe it didn’t!).

When you “know a language” what you really know is not a list of “grammar rules”. Grammar rules are just names for some high probability patterns, but they are not enough alone to make language with. No, what a native speaker knows subconsciously are millions of probabilities for word collocations, plus countless cultural and social contexts which will influence those probabilities.

Now here at last is the point about “idioms”. Idioms are just word collocations which have such a high probability of association that they become consciously recognized. However, there are vast numbers of other word collocation sets at various levels of probability and awareness. Some are quite frequent. Some are more rare, but still recognized by the members of the culture or sub-culture when they hear them. That is why we sometimes have that sense of déjà vu when people in our home culture are speaking. It seems that we have “heard it before”, but we cannot say exactly how or when.

Yes, I know this is all discouraging. How can you “learn” millions of probabilities? Of course you can’t, not consciously in a classroom. It needs a lifelong process of immersion in a culture and the way those people speak. Some people have a lucky knack for soaking this stuff up subconsciously, and in my experience they usually have no ability at all to explain what they have achieved. Professional language teachers, in my experience, rarely have much understanding of what their students’ brains are doing when the “learn a language”. That need not mean that they are bad teachers. From experience the best of them have learned some paths that students can take on the language learning journey. Their teacher explanations might peddle mythology, but it often doesn't matter if the trip seems interesting.

Natural language learning is the single most difficult thing that the human brain can do. However, our brains seem to be genetically adapted for this incredible language learning process, so most people cannot even see the miracle for what it is. That is why any backpacker can parachute into a foreign country, stand in front of a class and think they are “teaching English”.

Well, in this age we must have an app' for everything. Mere human brains might be too hard to think about, so here is a bit of tech-candy as a going-away present: for the exasperated and curious, Google gives us one of the most fantastic language tools I have ever seen: the Google n-Gram Viewer at . Just type in any word or phrase, or list of words or phrases. It will show you the frequency of that word or phrase or idiom for every year from 1800 up to now. That can tell you many things (e.g. changes in knowledge, history, attitudes, fashion, politics, science, technology etc). Have fun.


