Testing for Teaching; Teaching to What?


Thor May



The outline which follows analyses the two halves of a language teacher's profession:


a) The first half is daily classroom practice : what is taught and how is it evaluated?

b) The second half of a teacher's profession is to know or at least estimate what is going on in the brains of her students : what is learned and how is it learned?


Teaching is a simulation machine. Learning is for life. The implicit professional challenge is in making the simulation useful for living.


Note: The discussion here reflects a teacher’s interest in actual language learning, rather than that special game which sets out to manufacture “the IELTS/TOEFL performing clone”. Also, I have termed these notes an “outline”. It would be an abuse of language to call them an academic paper in any finished sense, and the absence of referencing reinforces that. There are, after all, whole academic faculties devoted to the study of testing, though unfortunately most teachers have never heard of them. Still, for those in a hurry, these reflections of my own may crystallize some of the questions which, sooner or later, will trouble any thoughtful teacher.


[A] Teaching Practice


1. Mass education is generally a poor medium for language learning, language teaching or language testing.


2. We are stuck with mass education, so what is discussed here relates to that. Individuals, especially intelligent individuals, planning their own learning can do much fancier things and succeed, or more basic things and still succeed. The second part of this short discussion explores some properties of mind that may partly explain why individuals rather than classrooms full of captive students are better at real language learning.


3. In mass education what is tested determines, mostly, what is learned. Therefore ideally designed testing will encourage desirable learning. Unfortunately tests in mass education are almost universally designed to facilitate administrative convenience, and forget about the backwash effects on learning.


4. Since administrative structures, agendas, requirements and hierarchies are inseparable from mass education, they have to be catered for in testing. One also aims for a design that facilitates learning, sometimes a little forlornly.


5. So what CAN be tested? At the classroom level in language education the teacher faces exceptional difficulty in evaluating SPOKEN language reliably.


Reading can be tested indirectly by comprehension tests. Writing can at least be taken away to evaluate at leisure (although holistic evaluation is time consuming and prone to inconsistency). Listening can be evaluated indirectly by dictation (if students can write), or by task completion.


Spoken language though requires one to one instant evaluation unless it is recorded. Relatively few classes have facilities to record large numbers of students simultaneously, or invigilate them for honest recording via computer/internet. One to one work leaves the rest of the class at a loose end (especially undesirable in a testing situation). As with writing, the evaluation of speaking is reducible to a large number of atomistic components which might be scored, but evaluation of the holistic outcome is liable to much subjective variation.


6. If you can't HEAR a second language reliably, the chances are that your construction, pronunciation and appropriateness in spoken language is unlikely to be adequate either. With this rationale, some teachers are content to test listening competence and forgo formal tests of speaking competence. Indeed there is some statistical support for the correlation of speaking and listening presupposed here. It is true that listening cues language recall while speaking involves the much more difficult process of active (i.e. not cued) recall. Nevertheless, passive vocabulary far exceeds active vocabulary, even in a first language, and evaluation tasks can calibrate for this.


7. Speaking is a composite of structuring a message that is


a) coherent, cohesive and with comprehensible discourse presuppositions;


b) socially appropriate;


c) decodable for pronunciation and intonation.


a) and b) can be more or less simulated by a written response. We have special registers of writing for recording speech. Therefore one solution for the teacher in mass education is to test simulated responses for a) and b), while partly or wholly forgoing formal evaluation of c).


This is indeed a very clumsy simulation. A dance without the music is not a dance. A graduate from a curriculum based on this kind of testing might well be a tongue-tied speaker. However, while the atomization of a), b) and c) will not yield an adequate whole language experience, it might be accepted in classrooms as PART of the process. Such evaluation would have to be combined with other things, such as task / role play/ discussion participation. Participation can be scored on a fairly crude scale such as enthusiastic/average/lax/nil. The idea is to give students some motive for polishing the whole as well as the parts.


8. An example of the simulation described in 7. might be a combined test of listening accuracy/comprehension and written dialogue response by the student. That is, the student would have to write down a dictated dialogue which provided the language for speaker A, but in which the responses of speaker B were not provided. He would then add responses for speaker B to what he had understood of speaker A.


Apart from the oral format, live speech is constructed in real time and assisted by body language and other extra linguistic context. Still, the teacher is unavoidably restricted. Formal language testing never evaluates the whole enchilada. The simulation compromise is better than nothing, and does have the advantage of administrative manageability.


8b. The real problem with 8 is that for the learner himself, simulation is a pretty unrewarding process. It is the spontaneity and instant feedback of live speech which gives it an emotional kick. When the learner is able to "fly" a little - respond to genuine communication - he has a sense of achievement that contributes greatly to both motivation and memory. Mass education of course is all about simulation.


9. Talking of whole enchiladas, language teachers are always condemned to tasting only a bit of the meal. For one thing, we are only entitled, really, to test what we teach. What we teach is only ever a fragment of language, regardless of how systematic or how chaotic the overall curriculum is. To take another analogy from martial arts, we can teach a few punches and blocks. When the learner tries to put those together in a real match with a rapid choreography of moves, things are apt to fall apart pretty quickly. We can study fragments of anything, but when it comes to complex dynamic systems, the only real description of them is actual, total performance. The teaching profession is therefore one with modest aims.



[B] The Mental Environment of Learning


Here the discussion will turn from classroom testing techniques to the mental conditions that students bring to a classroom, and the bearing of that on what can be achieved in classroom environments.


10. The learning of fragments is superbly handled by a uniquely human facility called DECLARATIVE MEMORY. Declarative memory is what you use when you recite a list of "facts". Adults are better at this than children, so not surprisingly a fair bit of research claims that adults are better than children at most aspects of language learning, intonation and pronunciation excepted. I suspect that what this research really shows is that most formal language evaluation draws on declarative memory.


11. The performance of complex, dynamic skills is not unique to humans. The wolf closing on a fleeing antelope is making breathtaking mathematical calculations of speed and distance. Of course, the wolf is not "aware" of these calculations - it is all autonomic and subconscious. Similarly, when we speak our native language, the construction proceeds with astounding speed and accuracy at a level entirely beyond our conscious management. The extent to which the wolf has learned its skills is open to debate. There is no doubt however that children learn their first language socially (even though there is heated argument about level of detail in an inherited mental template which makes this possible). The memory that we use to speak language natively, or to exercise the skill of riding a bicycle, is called PROCEDURAL MEMORY.


12. Much about the nature of procedural memory remains obscure. It is subconscious, and very resistant to conscious formulation by introspection (which is why linguistics is so arcane for most people). Studies of brain damaged individuals, and now MRI resonance scanning suggest that it is distributed differently in the brain from centers of declarative memory. The input of declarative memory to procedural memory is, at the moment, not understood well at all : there is much disagreement and confusion.


13. One likely explanation of the nature of procedural memory is that it is a kind of probability machine. This fits well with what is known about the physiology of nervous systems, which depend upon cascading electrical signals.


Language is certainly a probability game. You listen to a stream of sound or read a sentence. As you perceive one word, your brain is furiously calculating the probabilities of what can follow it. The more you hear of a sentence and know of a context, the better your guesses get. Thus "what am I going to say ...." ... next/now/when/believing .... . "Believing" is grammatically possible, but surely far less probable than "next". This knowledge of collocation probabilities is what makes it possible for us to speak and listen so quickly. It is knowledge which comes from a huge amount of live experience. It is the hardest part of language learning, yet few teachers let alone students, are really aware of the real nature of this task.


14. It is clear that the acquisition of declarative knowledge is relatively efficient and measurable in humans. Thus our mass education systems thrive on it. It is also evident that the acquisition of procedural knowledge, especially relating to very complex dynamic systems like language, tends to be very slow and requires a great deal of holistic practice. It is also very difficult to measure objectively, and hence tends to be downplayed or ignored in mass education systems.


15. Since everyone learns a first language as an infant, but adults are much less uniformly successful in mastering a second language at a procedural level (as opposed to regurgitating facts) it would seem that the infant brain is more accessible to fairly rapid procedural learning than the adult brain. Given the numerous autonomic skills (e.g. walking) that infants have to learn apart from language, this makes some sense.


16. It is also apparent that those adults who do master a second language are often (even typically) very poor at pedagogical grammar. That is, frequently they do not excel at the declarative enumeration of grammatical 'facts', and abhor conscious linguistic analysis. Somehow they have found a way to develop procedural knowledge of a language without being able to consciously analyse the process.


17. Procedural learning apparently depends, above all, on repetition, and some successful adult learners have evidently been able to maintain a kind of open alertness under conditions that would drive many others to revolt, distraction or just sleep. Maybe a lack of imagination helps sometimes! There is obviously scope for designing curriculums and methods that involve a great deal of holistic repetition, but in ways which don't lead the larger body of learners to reject the process out of boredom.


18. The mental preparation of students is also clearly critical in dealing with procedural memory, but not necessarily in ways that are ideal for declarative memory. Perhaps the Lozanov idea of quasi hypnosis (Suggestopedia) has a place here. Managing the subconscious is a slippery business, more studied in Eastern traditions of meditation than Western philosophies. Nevertheless, it has popular acceptance as a kind of magic learnable by some fictional individuals : Harry Potter, Luke Skywalker etc. It will not translate easily into the crassness of mass education systems.


The Great Conundrum


Teachers test their own simulations of life. Student brains learn to perform these declarative simulations. In life however, what is useful for the language learner is automatic procedural performance. In language classrooms this is only accidentally learned and poorly measured.


It seems that we need twin objectives for the development of language teaching in mass education systems :


a) a way to develop procedural memory and procedural skills in a second language for large numbers of people;


b) a way to measure this acquisition of procedural skills that is administratively manageable, reliable and actually encourages learning. 


At the moment I do not know of anyone who has found a credible way to achieve these objectives on a large scale. The failure rate for most procedural foreign language acquisition in most places of mass education is very high (one famous estimate puts the failure in America to achieve useful competence in L2 at around 95%). We remain an accidental profession.



Thor May's PhD dissertation, Language Tangle, dealt with language teaching productivity. Thor has been teaching English to non-native speakers, training teachers and lecturing linguistics, since 1976. This work has taken him to seven countries in Oceania and East Asia, mostly with tertiary students, but with a couple of detours to teach secondary students and young children.

