The Probable Language Brain


Thor May

Brisbane 2013, 2015






Abstract: Let us suppose that you are a research linguist, tormented by some doubts and questions about the state of your profession, and not constrained by having to repeat a catechism of "known truths" to Linguistics 101 students, and not worried about employment tenure. How would you actually go about tackling "the central problem of linguistics", namely how we acquire and maintain knowledge of the probability of systemic relationships in a language?


Here are two simple pragmatic truths :


a) if you ask me the product of 9x8 I can tell you instantly : 72


b) if you ask me the product of 9x14 I have to calculate out each digit, then remember to add the results. It is slow and I might easily make a mistake. That is because in my primary school they only made us memorize up to 12x12.


The first act, a) is performed courtesy of my procedural memory and as a product of a physical neuronal relationship. (Procedural memories are routines acquired by practice until they become subconscious, such as the skill of driving a car. Psychologists would probably call the neuronal relationship some kind of "long term memory"). I am unlikely to ever forget the answer to 9x8, but growing that association was hard. It took a lot of childhood practice.


The second act, b) is performed by the conscious application of rules I have learned. Deliberate multiplication and addition seems to take place in a workroom next to my declarative memory. (Declarative memories are learned facts accessible to conscious recall. Psychologists would probably call the workroom "short term memory"). On a bad day I might stumble trying to apply the rules of arithmetic. Large numbers of people never become any good at it.


In one way, my knowledge of a) is somewhat similar to my knowledge of my native language. I don't have to sit there trying to apply "grammar rules" before I can talk. Rather, the flow of words, like the result of multiplying 9x8 emerges instantly.


However, the mathematical outcome of multiplying 9x8 is 100% certain. In this, the behaviour of natural language is rather different. Visible or audible language deals with long lists of tokens (morphemes, words) added together. Although these language lists seem linear, it need not follow that their actual creation or interpretation is a strictly linear process. In fact I will argue that such strict linearity is unlikely. A nicer metaphor than elementary math or formal logic might be to think of a child waking and stumbling to the bathroom. The family is disturbed, lights go on, then gradually a whole city wakes up as all kinds of routines and disturbances more and more densely prod each other into life. These events are broadly predictable, but the details always vary. The mathematics describing such processes would be more akin to complexity theory than arithmetic. In other words, the relationship between a particular word occurring in an utterance and all the words which came before it is rarely 100% certain in either production or decoding. Rather that relationship is a probability. That probability is influenced by two different kinds of things : one internal to memory, and one external to "the world" (the context of the talking or writing).


The power of the personal memory in word relationships is very strong. It is called the collocational probability. That is, when we hear the word stream "We never knew what he was going to ...", as native English speakers we know that the following words will probably be "do next" or "say next", even though "propose" or even "ingest" would be perfectly grammatical in the sequence. Knowing a language means in some way knowing the probable relationship between each word in the language and all the other words, and groups of words in your vocabulary. Although a native speaker can no more utter the list of his complete vocabulary than he can "explain grammar", his probability knowledge of word relationships is quite secure, like the 9x8 association. Accessing it is fast and not usually error prone. Knowing collocational probability helps the language user to guess meanings at great speed, even under poor conditions. When we add our internal knowledge of probable word associations to the prompts of an external social situation, the outcome for meaning is usually highly predictable. Of course, speakers of a second language learned later in life have no such advantage.


How we acquire and maintain knowledge of the probability of systemic relationships in a language is, in my view, the central problem of linguistics. None of the solutions I have seen suggested for the linguistic problem have looked convincing to me. That is very interesting. We have a challenge (and I certainly don't have any pat answers either!). For your ordinary workaday language teacher this may also mean that the expensive text books and courses which qualified her, gave her a "scientific narrative" to explain what happens in the minds of her students were, well, a kind of humbug. That is sure to be a very unpopular proposal, but not outrageous. For example, the training that an 16th Century medical doctor undertook was also a kind of humbug in this sense (and 21st Century medical doctors ...?). 


"Grammar" as found in books by linguists and language teachers, is a very crude abstracted summary of some common patterns of collocational relationship, albeit extremely probable ones. Applying such grammar rules consciously to single words to make sentences is even slower than multiplying 9x14, and much more likely to generate mistakes. Of course, some people enjoy doing this just as some people enjoy doing arithmetic, but even they need the luxury of time for such a game. In speech you don't have the luxury of time. It is not a chess game or a math game. It is more like dancing. If you stop to think about where your feet should go at each step, the music will have moved on. In short, most people find that consciously applying grammar rules to words to make sentences is slow, boring and unreliable. For them, trying to use the summary of relationships found in a pedagogical grammar is an incredibly inefficient way to make language. Indeed, few people ever master it, even if they can parrot the grammar rules.


In my view, the "generative grammars" first popularized by Noam Chomsky in his doctoral dissertation, "Syntactic Structures" (1957) are also a misguided way to explain the relationship between the human brain and language. The generative paradigm seems directly traceable to Chomsky's training in formal logic. The ability of some humans to excel at performing such logic consciously is a very different proposition from claiming that all humans excel at it subconsciously and to the exclusion of other possible solutions to the mystery of how meaningful language is made.


It is worth nailing generative grammars in this context because they have become memes, mutating social viruses, and a career vehicle for large numbers of very clever academics who will defend them to the death, though we hope not for the 1500 years it took to dispose of Ptolemy's universe which had the sun revolving around the earth. Fortunately for linguistics, the high tide of rationalist theorizing seems to be in retreat at last, with empirical, evidence based research regaining credibility within more sophisticated paradigms of neural complexity. For example, see Evans 2014a, 2014b, 2015; LaPolla 2015. Golumbia (2015) very neatly traces the nature of Chomsky’s Cartesian mindset and scientific incoherence. Now both Ptolemy's heirs and Chomsky's heirs learned lots of interesting, even useful things on the way to being wrong. However, the generative grammar paradigm seems warped for a similar reason that deliberately trying to make language using pedagogical grammar rules is stupid. That is, generative grammars, while they may hint at some broad mental constraints, are crude, missing thousands if not millions of micro relationships. They have never been demonstrated to actually "generate" real, credible language strings free of garbage because they can't. For the live human user generative grammars are in principle insufficient and inefficient. They also lack explanatory power as models.


If any kind of computing analogy is appropriate at all our brains might be thought of as probability machines, working as parallel processors, albeit with outcomes which are frequently co-emergent with many other social elements. Certainly human brains are, on the whole, not logical machines working with binary arithmetic, and even recursive "generative" language programs could not change that.


While generative linguists claim to be modeling human mental processes, others with a more social disposition ignore the mind altogether and insist that nearly all useful knowledge about language can be extracted from social situations. Surely this is unbalanced too. In one sense the social environment is often little more than a stage for our disguised monologues.


It is nevertheless true that the probability of word relationships on any particular occasion can also be strongly influenced by the situation. The "situation" is the social context of the language, and the combination of words which I consider appropriate to apply in that situation. For the listener, this social context influences our attempts to understand the meaning of a string or words so strongly that even the wrong words can sometimes be "given" a meaning which that listener considers appropriate. There is good evidence that many people listen very carefully to the "way" in which words are said (the connotation) when estimating probable meaning, rather than noting the formal probabilities (i.e. the denotation of lexis in the context of grammar) which the word combinations are supposed to yield. That is one reason that politicians saying bad things in soothing ways can still get voted for.


One kind of evidence for what real language users DON'T do mentally comes from their attempted explanations to L2 learners of that language. Part of a language teacher's training is to acquire a repertoire of explanations to soothe the puzzlement of learners, especially adult learners. The analogy with what trainee doctors acquire is very close here. The explanations of both the teachers and the doctors might or might not be based on credible science. The explanations might or might not be helpful to their clients. Even in the 16th Century doctors were valued, and so were witch doctors at other times, not to mention priests. Human beings in their natures seek explanations, and the placebo effect can be powerful. However the naïf who asks an untutored "native speaker" to explain the mysteries of their language is more likely to excite humiliation. What the native speaker can do, quite usefully, is to say whether a phrase is "right" or "wrong". That is, they can respond like any language user to their inner knowledge of collocational probabilities.


Let us suppose then that you are a research linguist, tormented by some of the doubts and questions which have been raised above, and not constrained by having to repeat a catechism of "known truths" to Linguistics 101 students, and not worried about employment tenure. How would you actually go about tackling "the central problem of linguistics", namely how we acquire and maintain knowledge of the probability of systemic relationships in a language?


Well, firstly you might need a very long life ahead of you, and a very sharp brain, to get on top of the maze of contributing variables. Secondly, you might wonder how so many other very sharp researchers seem to have already gone astray, and why. Thirdly, considering the second matter, you might need a taste for the suicide mission of standing apart on the precipice of challenging "truth from authority", that is, the Ptolemy trap. If, in a daring mind experiment, this linguist imagines leaping all the barriers, where will he tread next?


One option for a fresh look at linguistics might be step outside of the post-Enlightenment paradigm of experimental research in some ways. Any such attempt requires caution and clear thinking. The positivist paradigm of research in the hard sciences has a number of bedrock requirements. Above all, experiments must be independently repeatable, which requires a strict control of contributing variables and a completely explicit methodology. (In real life there have been repeated and continuing scandals on both fronts when the stakes are high). Social science research has been forced to fuzz this pure approach in various ways, then pretend that it hasn't by throwing up a smokescreen of statistical trickery. We know now that social science research can be useful in a confusing world, but that certainty is rarely one of its virtues. We also know that social science research can and has been done to "prove" almost every favoured political posture.


Chomsky and his cohort of generative grammar academics did attempt another variant of violating "pure" research convention by harking back to 17th Century European rationalism (Golumbia 2015). Their limitation however was precisely the old limitation of cleverly arguing how many angels can dance on the head of a pin: conclusions are only as good as the premises which sustain them. Exclude inconvenient premises and you have screwed the "science". That is the road to scholastic sterility, which is supposed to be a medieval memory, but which actually permeates institutions of learning to this day.


The formal research approach of this school of generative linguistics was actually to create avatars, known as "ideal speaker-hearers" who took on nominal properties of real speaker-hearers, but only as defined by the model and only in selected environments. This linguistic avatar game had much in common with online virtual computer games. The way to "kill" an avatar was to have them use a linguistic string which some native speaker, somewhere, judged to be "wrong" or "non-felicitous". Then you resurrected the avatar to use a slightly different linguistic string which, hopefully, would get a pass by the natives. The plan was to eventually collect a large enough number of these consciously manufactured felicitous strings to predict which rules would generate them (and only them) reliably. Then the model could credibly represent the grammar of a "real" language. Except that it couldn't. Fifty years of playing this game has shown that it can't succeed, something that critics have long maintained, but these things have their own momentum.


One fatal flaw of the linguistic "ideal speaker-hearer" avatar kind of research has always been that the avatars are arbitrary and changeable at whim. The samples which each researcher's avatar collected as valid strings were essentially random fragments from multiple sources.


This is the old, old story of seven wise men describing different parts of an elephant and imagining universal truths about the whole elephant. After many heated arguments, the seven wise men might even agree about the form of a compromise imaginary elephant which incorporates their separate observations, but it is still an imaginary elephant. Of course, describing the elephant, or Nature if you like, is the quintessential problem of big-letter Science. There is no final solution to this dilemma, but we can say that in principle the smaller and more arbitrary the observed fragments of a system are, the less reliable any prediction about the whole system is likely to be. In the case of the linguistic elephant, there is clearly scope for improving the quality of observations which are made of the real language systems used by real speaker-listeners.


The single largest qualitative improvement which could be made to linguistic research, in my view, would be to narrow the number of observed speaker-listeners to ONE, that is one idiolect, and to multiply the range of observation of that idiolect. In other words, the researcher would be maximizing his comprehension of complexity in that single system.


Once the form and dynamics of one idiolect was understood in maximal detail, only then would it be appropriate to look at other idiolects for the common systemic elements of dialect and language.


A couple more metaphors come to mind here (one perhaps a bit too homely: I can't help myself). When I visit a doctor occasionally, I have learned to predict on average that this person will have about a 20% chance of offering useful information, and an 80% chance of being useless to dangerous. Usually the man has never seen me before, he is focused on some small part of my anatomy which is causing a problem, he is disposed by training and limited time towards a quick, drug based "solution", and his understanding of the human body as an integrated system in motion (I am a distance runner) is usually laughable. That is, the elephant for this doctor is a very imaginary beast which he tampers with at peril. And so it is with the doctor of linguistics ...


Now perhaps a more sober analogy. Engineers generally understand that some systems are scalable, and some systems are non-scalable. An engineer can build a small model of, say, a bridge, test it in a wind tunnel and predict with fair accuracy the stresses which will apply to an actual full sized bridge.


However, computer scientists know very well (to their chagrin) that although they can write a computer program of impressive complexity, even millions of lines of code, it is simply not possible to write a smaller, simpler computer program to model the behaviour of a larger, more complex computer program. They also know that every computer program ever written has had bugs which can only be eliminated by trial and error, and frequently generate new bugs in the process of correction.


There is a mathematical reason for the exasperating characteristics of computer programs: they are randomly discontinuous phenomena. The parts cannot reliably predict the behaviour of the whole.


Now when it comes to the dynamic behaviour of natural languages, they are definitely much closer to the computer science end of engineering than they are to the neatly scalable behaviour of mechanical engineering. However, to this point vast libraries of linguistic research have pretended that small, random fragments of observed linguistic behavior from strangers can be assembled as scalable components some imaginary linguistic elephant, and be used for predicting the form and behaviour of the massively complex linguistic system in my head or in your head.


Can't we do better than this?


Perhaps we can. This essay has been suggesting that generations of work by very clever people has been misdirected. That would be a hard complaint to take seriously if there were no alternative paradigm to measure the evidence against. As it happens there is such a paradigm in the broad fields of scientific endeavour. It relates to what has become the science of complexity, together with a whole complimentary branch of mathematics. Complexity research turns out to be full of difficult challenges, so it may not be surprising that very few linguists have staked a career in it. However, there are some general principles in complex systems which front and centre relate to the phenomenon of natural languages. I can only mention them in the briefest way in an essay like this.


Complex systems are emergent. The term emergent suggests the absence of a superordinate causative agent. That is, such systems tend to be self-organizing, or in some contexts can be appropriately described as self-teaching (Ransom 2013). Holland (2014) points out that emergence is a property without sharp demarcation. There are degrees of emergence. Nevertheless, when such systems  do go through a process of emerging, their internal relationships become mathematically non-linear. In plain language, the whole is more than the sum of the parts. One of Holland’s examples is that individual molecules of water are not “wet”. The quality of wetness only emerges with a certain aggregation of water molecules. A second quality of emergent complex systems is that they contain independently functioning but related hierarchies:


“Hierarchical organization is … closely tied to emergence. Each level of a hierarchy typically is governed by its own set of laws. For example, the laws of the periodic table govern the combination of hydrogen and oxygen to form H2O molecules, while the laws of fluid flow (such as the Navier-Stokes equations) govern the behaviour of water. The laws of a new level must not violate the laws of earlier levels.” [Holland 2014, p.4]


Cognitive and computational linguists working in environments of artificial intelligence to emulate natural language processing (NLP) are now well aware of course of the complex systems properties of natural language and its characteristics of emergence. NLP models in artificial intelligence were dominated for many years by a logic based symbolic systems approach compatible with Chomsky’s ideas in generative linguistics. This kind of modelling in AI was able to meet certain constrained engineering needs but proved unable generate anything like unlimited, well-formed natural language.


Alternative connectionist models working with the self-teaching properties of complex systems originally lacked the sophistication and computing support to provide adequate proof of concept demonstrations. Recently this has begun to change. Some recent rigorous research by Golosio et al (2015) claims to have developed a system, using adaptive neural gating mechanisms, which can self-learn from a tabula rasa state to a level of communicative competence equivalent to a four year old child. (Full documentation and data sources are available in the public domain). This is an exciting development if research replication fully substantiates it, and the 2.1 million artificial neurons  Golosio et al are working with can be scaled with enriched outcomes towards the 100 billion neurons of a human brain.


If you have any feeling for the multiple systems of language and their levels at all, the characteristics of emergent systems will surely strike a clear echo. A word is more than the sum of its morphemes, a sentence more than the sum of its words, a novel more than the sum of its sentences. The superordinate emergent quality at each level is what, in common parlance, we call meaning.


In our minds, if we reverse engineer the apparent constituents of a novel, a sentence, a word, a morpheme (or phoneme) and try to identify them as clearly defined classes we, or at least the linguists amongst us, are apt to find that the classes are indeterminate at the margins. Some nouns are more noun-like than other nouns (e.g. dog Vs swimming), just as some dogs are more dog-like than other dogs. As it happens, some sentences are more sentence-like than other sentences, and some novels more novel-like than other novels. A number of linguists (Eleanor Roch, George Lakoff and others) have called this effect prototype theory and done some excellent work. But prototype qualities are another of the common properties of emergent systems.


The underlying assumption of linear generative models of linguistics was that “well-formed sentences”, or well-formed sub-systems at other levels of hierarchy, were constituents with sharp category margins which could be atomized and reassembled according to rather simple and explicit rules. In principle it would indeed be possible to tip a soup of words and a handbook of the right syntactic rules into a proverbial computer and expect well-formed natural language to come out the other end.


The concept of natural language as a (very) complex emergent system renders generative models of linguistics incoherent. The underlying rules of the game are not linear, but exhibit the very different mathematics of non-linear behaviour. The outcomes of language creation are greater than the individual words which comprise the language.


At the beginning of this essay I said that learning a language was learning to predict collocations. I said that language use was a probability game. On the face of it, predicting the probability of a collocation would be perfectly compatible with a linear generative model, even if the task with an enormous number of words in play was statistically overwhelming. Yet on the face of it, predicting the probability of a collocation within the non-linear hierarchies of language according to a complexity model might seem impossible. After all, another property of complex systems is that outcomes are inherently unpredictable. In such systems, each iteration is a bit different.


There is an answer to the apparent contradiction implicit in predicting collocations within a complexity based system. The solution is made possible by the constrained indeterminacy of categories and occurrences themselves. That is, indeterminacy in complex systems is bounded. Meteorologists can predict with passable accuracy that a certain number of storms will strike your city in a given season. They cannot predict when and where those storms will strike. A listener can predict with useful accuracy what his interlocutor is likely to say, what words he is likely to use, and in which general syntactic configuration. His mind prepares resources to manage this. The listener however cannot be certain when, where and quite how a speaker will use particular words, only their likelihood within the social bounds of the situation.


The configuration of a possible language brain is one of life’s most intriguing mysteries. For most people it remains an invisible miracle within plain sight. I noticed the miracle long ago, went in search of some answers, then followed paths of explanation set out by those who had some confidence they understood (and published books to prove it). In the end it seemed that these sages were largely talking to themselves, in spite of some useful hints along the way.


I wondered at my own incompetence at second language learning, why language teachers as a species mostly seemed to loath analytic linguistics, why the success or not of students I taught English to as a second language seemed to bear no correlation to talents for formal, linear analytic thought.  My conclusion was a deep suspicion that the narratives about grammars which were lectured to “applied linguistics” students hoping to be teachers contained a large mix of academic fantasy. Yet I was not wise or clever enough to invent a better narrative myself.


The task ahead of us is to find a credible narrative to explain just how the languages we learn and teach can possibly come into being, then function in workable ways. My hopeful suspicion is that the study of natural languages as complex emergent systems can set us on a productive path to that understanding.





Afternote: As with many of the articles I have put into public forums, The Probable Language Brain is not a production of beaver-like scholarship, scattering the names of illustrious researchers and long lists of respectable references. There is a place for all of that.


My purpose here has been to synthesize some very interesting questions about language in a way which encourages thinking and debate. I have tried to make the arguments as accessible as possible.


My own conclusions may of course be wrong on many levels, but by presenting issues in this cut-down format the hope is that both the proponents and antagonists to various propositions may be moved to present their own cases with more persuasive clarity.


For what it is worth, my own encounters with formal linguistics have fluctuated in intensity, but stretch back as far as the 1970s, beginning with the linear models of generative linguistics, and finally with me walking away from two earlier doctoral candidatures after years of mounting skepticism.





Golosio B, Cangelosi A, Gamotina O, Masala GL (2015) A Cognitive Neural Architecture Able to Learn and Communicate through Natural Language. PLoS ONE 10(11): e0140866. doi:10.1371/journal.pone.0140866 . Available online @


Golumbia, David (2015) "The Language of Science and the Science of Language: Chomsky's Cartesianism". Diacritics, Volume 43, Number 1, 2015, pp. 38-62. Online at @


Evans, Vyvyan (2014) "The language myth: Why Language is not an instinct". UK: Cambridge University Press


Evans, Vyvyan. (2014b) "Real talk: For decades, the idea of a language instinct has dominated linguistics. It is simple, powerful and completely wrong". Aeon magazine online @


Evans, Vyvyan (2015) "The structure of scientific revolutions: reflections on radical fundamentalism in language science". Language in the Mind blog, Psychology Today, online @


Holland, John H. (September 2014) "Complexity: A Very Short Introduction". Kindle edition online @


Lakoff, George (2008-08-08). Women, Fire, and Dangerous Things (p. 24). University of Chicago Press. Kindle Edition. Amazon online @

LaPolla, Randy J. (2015) "Review of The language myth: Why Language is not an instinct". online @


Larsen-Freeman, Diane (29 March 2014) "Complexity Theory: Renewing Our Understanding of Language, Learning, and Teaching". [video] TESOL International Conference, Oregon USA. online @  


Valiant, Leslie (2013) "Probably Approximately Correct: Nature's Algorithms for Learning and Prospering in a Complex World". Basic Books. Kindle edition available online @   








Professional bio: Thor May's PhD dissertation, Language Tangle, dealt with language teaching productivity. Thor has been teaching English to non-native speakers, training teachers and lecturing linguistics, since 1976. This work has taken him to seven countries in Oceania and East Asia, mostly with tertiary students, but with a couple of detours to teach secondary students and young children. He has trained teachers in Australia, Fiji and South Korea. In an earlier life, prior to becoming a teacher, he had a decade of drifting through unskilled jobs in Australia, New Zealand and finally England (after backpacking across Asia in 1972).  




All opinions expressed here are entirely those of the author, who has no aim to influence, proselytize or persuade others to a point of view. He is pleased if his writing generates reflection in readers, either for or against the sentiment of the argument.


The Probable Language Brain ©Thor May 2013, 2015; all rights reserved


WebSTAT - Free Web Statistics