186
The previous five chapters have given advice on the testing of different
abilities. The assumption has been that we need to obtain separate information on each of these abilities. There are times, however, when we do
not need such detailed information, when an estimate of candidates’
overall ability is enough.
One way of measuring overall ability is to build a test with a number
of components: for example, reading, listening, grammar and vocabulary. Specifications are written for the individual components and these
are incorporated into specifications for the entire test, with an indication of the weight that each of the components will be given. The scores
on the different components of the test are added together to give an
indication of overall ability. This is what happens in many proficiency
tests. Even if the scores on the individual components are given, they
may be ignored by those who use the test scores.
But building a big test of this kind is hardly economical if we simply
want to use test results for making decisions that are not of critical
importance and in situations where backwash is not a consideration.
One obvious example of this is placement testing in language schools.
Typically, all that is asked of language school placement tests is that they
assign people to a level in that school. If people are misplaced by the
test, they can usually easily be moved to a more appropriate class;
provided that not too many such moves are called for, this is not a
problem. And since people do not normally prepare for placement tests,
there is no need to worry about possible backwash. In these circumstances, it turns out that there are fairly straightforward and economical ways of estimating overall ability.
Before looking at techniques for doing this, however, it is worthwhile
giving a little thought to the concept itself. The notion of overall ability
is directly related to the commonsense idea that someone can be good
(quite good, or poor) at a language. It makes sense to say that someone
is good at a language because performance in one skill is usually a
reasonable predictor of performance in another. If we hear someone
14 Testing overall ability
https://doi.org/10.1017/CBO9780511732980.015 Published online by Cambridge University Press
Testing overall ability187
speaking a language fluently and correctly, we can predict that they will
also write the language well. On some occasions, of course, we may be
wrong in our prediction, but usually we will be right. This is hardly
surprising, since, despite their differences, speaking and writing share a
great many features, most obviously elements of grammar and vocabulary. It is essentially this sharing of features that allows us to measure
overall ability economically.
One last thing to say before looking at techniques is that some of
them are based on the idea of reduced redundancy. When we listen to
someone or read something, there is more information available to us
than we actually need in order to interpret what is said or written.
There is redundancy. Native speakers of a language can cope well when
this redundancy is reduced. They can, for example, understand what
someone is saying even though there are noises in the environment that
prevent them from hearing every sound that is made. Similarly, they can
make out the meaning of the text of a newspaper that has been left
outside in the rain, causing the print to become blurred. Because nonnative speakers generally find it more difficult to cope with reduced
redundancy, the deliberate reduction of redundancy has been used as a
means of estimating foreign language ability. Learners’ overall ability
has been estimated by measuring how well they can restore a reduced
text to its original form.
Varieties of cloze procedure
In its original form, the cloze procedure reduces redundancy by deleting
a number of words in a passage, leaving blanks, and requiring the
person taking the test to attempt to replace the original words. After a
short unmutilated ‘lead-in’, it is usually about every seventh word that
is deleted. The following example, which the reader might wish to
attempt, was used in research into cloze in the United States (put only
one word in each space). The answers are at the end of this chapter.
https://doi.org/10.1017/CBO9780511732980.015 Published online by Cambridge University Press
Testing for language teachers
188
https://doi.org/10.1017/CBO9780511732980.015 Published online by Cambridge University Press
Testing overall ability
189
Some of the blanks you will have completed with confidence and ease.
Others, even if you are a native speaker of English, you will have found
difficult, perhaps impossible. In some cases you may have supplied a
word which, although different from the original, you may think just as
good or even better. All of these possible outcomes are discussed in the
following pages.
There was a time when the cloze procedure seemed to be presented
almost as a language testing panacea. An integrative method, it was
thought by many to draw on the candidate’s ability to process lengthy
passages of language: in order to replace the missing word in a blank,
it was necessary to go beyond the immediate context. In predicting the
missing word, candidates made use of the abilities that underlay all
their language performance. The cloze procedure therefore provided a
measure of those underlying abilities, its content validity deriving from
the fact that the deletion of every nth word meant that a representative
sample of the linguistic features of the text was obtained. (It would
not be useful to present the full details of the argument in a book of
this kind. The interested reader is referred to the Further reading
section at the end of the chapter.) Support for this view came in the
form of relatively high correlations between scores on cloze passages
and total scores on much longer, more complex tests, such as the
University of California at Los Angeles (UCLA) English as a Second
Language Placement Test (ESLPE), as well as with the individual
components of such tests (such as reading and listening).
The cloze procedure seemed very attractive as a measure of overall
ability. Cloze tests were easy to construct, administer and score. Reports
of early research seemed to suggest that it mattered little which passage
was chosen or which words were deleted; the result would be a reliable
and valid test of candidates’ underlying language abilities. Unfortunately, cloze could not deliver all that was promised on its behalf. For one
thing, it turned out that different passages gave different results, as did
the deletion of different sets of words in the same passage. A close
examination of the context that was needed in order to fill a blank
successfully (and studies of what context people actually used) showed
that it rarely extended far from the gap. Another matter for concern was
the fact that intelligent and educated native speakers varied quite
considerably in their ability to predict the missing words. What is more,
some of them did less well than many non-native speakers. The validity
of the procedure was thus brought into question.
https://doi.org/10.1017/CBO9780511732980.015 Published online by Cambridge University Press
Testing for language teachers
190
Selected deletion cloze
There seems to be fairly general agreement now that the cloze procedure
cannot be depended upon automatically to produce reliable and useful
tests. There is need for careful selection of texts and some pre-testing.
The fact that deletion of every nth word almost always produces problematical items (for example, impossible to predict the missing word),
points to the advisability of a careful selection of words to delete, from
the outset. The following is an in-house cloze passage, for students at
university entrance level, in which this has been done. Again the reader
is invited to try to complete the gaps.
Choose the best word to fill each of the numbered blanks in the
passage below. Write your answers in the space provided in the
right hand margin. Write only ONE word for each blank.
Ecology
Water, soil and the earth’s green mantle of plants
make up the world that supports the animal life of the
earth. Although modern man seldom remembers the
fact, he could not exist without the plants that harness
the sun’s energy and manufacture the basic food-stuffs
he depends (1) __________ for life. Our attitude
(2) __________ plants is a singularly narrow
(3) __________. If we see any immediate utility
in (4) __________ plant we foster it.
(5) __________ for any reason we find its presence
undesirable, (6) __________ merely a matter of
indifference, we may condemn (7) __________ to
destruction. Besides the various plants (8) __________
are poisonous to man or to (9) __________ livestock,
or crowd out food plants, many are marked
(10) __________ destruction merely because,
according to our narrow view, they happen to
(11) __________ in the wrong place at the
(12) __________ time. Many others are destroyed
merely (13) __________ they happen to be associates
of the unwanted plants.
The earth’s vegetation is (14) __________ of a web
of life in which there are intimate and essential
relations between plants and the earth, between
plants and (15) __________ plants, between plants
and animals. Sometimes we have no (16) __________
but to disturb (17) __________ relationships, but
we should (18) __________ so thoughtfully, with full
awareness that (19) __________ we do may
(20) __________ consequences remote in time and
place.
(1) _____________
(2) _____________
(3) _____________
(4) _____________
(5) _____________
(6) _____________
(7) _____________
(8) _____________
(9) _____________
(10) _____________
(11) _____________
(12) _____________
(13) _____________
(14) _____________
(15) _____________
(16) _____________
(17) _____________
(18) _____________
(19) _____________
(20) _____________
https://doi.org/10.1017/CBO9780511732980.015 Published online by Cambridge University Press
Testing overall ability
191
The deletions in the above passage were chosen to provide ‘interesting’
items. Most of them we might be inclined to regard as testing ‘grammar’,
but to respond to them successfully more than grammatical ability is
needed; processing of various features of context is usually necessary.
Another feature is that native speakers of the same general academic
ability as the students for whom the test was intended could be expected
to provide acceptable responses to all of the items. The acceptable
responses are themselves limited in number. Scores on cloze passages of
this kind in the Cambridge Proficiency Examination have correlated very
highly with performance on the test as a whole. If cloze is to be used to
measure overall ability, it is this kind which I would recommend. General
advice on the construction of such tests is given below.
Conversational cloze
The two passages used to create cloze tests above are both quite formal
prose. If we want our measure of overall ability to reflect (and hopefully
predict) oral as well as written ability, we can use passages which represent spoken language. The next passage is based on a tape-recording of
a conversation. As this type of material is very culturally bound, probably only a non-native speaker who has been in Britain for some time
could understand it fully. It is a good example of informal family conversation, where sentences are left unfinished and topics run into each
other. (Again the reader is invited to attempt to predict the missing
words. Note that things like John’s, I’m, etc. count as one word. Only
one word per space.)
https://doi.org/10.1017/CBO9780511732980.015 Published online by Cambridge University Press
Testing for language teachers
192
https://doi.org/10.1017/CBO9780511732980.015 Published online by Cambridge University Press
Testing overall ability
193
This ‘conversational cloze’ passage turned out to be a reasonable predictor of the oral ability of overseas students (as rated by their language
teachers) who had already been in Britain for some time. It suggests that
we should base cloze tests on passages that reflect the kind of language
that is relevant for the overall ability we are interested in.
Advice on creating cloze type passages
1. The chosen passages should be at a level of difficulty appropriate to
the people who are to take the test. If there is doubt about the level,
a range of passages should be selected for trialling. Indeed it is always
advisable to trial a number of passages, as their behaviour is not always
predictable.
2. The text should be of a style appropriate to the kind of language ability
being tested.
3. After a couple of sentences of uninterrupted text, deletions should
be made at about every eighth or tenth word (the so called pseudorandom method of deletion). Individual deletions can then be moved a
word or two to left or right, to avoid problems or to create interesting
‘items’. One may deliberately make gaps that can only be filled by reference to the extended context.
4. The passage should then be tried out on a good number of comparable
native speakers and the range of acceptable responses determined.
5. Clear instructions should be devised. In particular, it should be made
clear what is to be regarded as a word (with examples of isn’t, etc.,
where appropriate). Students should be assured that no one can possibly replace all the original words exactly. They should be encouraged
to begin by reading the passage right through to get an idea of what is
being conveyed (the correct responses early in the passage may be
determined by later content).
6. The layout of the second test in the chapter (Ecology) facilitates
scoring. Scorers are given a card with the acceptable responses
written in such a way as to lie opposite the candidates’ responses.
7. Anyone who is to take a cloze test should have had several opportunities to become familiar with the technique. The more practice they
have had, the more likely it is that their scores will represent their
true ability in the language.
8. Cloze test scores are not directly interpretable. In order to be able to
interpret them we need to have some other measure against which
they can be validated.
https://doi.org/10.1017/CBO9780511732980.015 Published online by Cambridge University Press
Testing for language teachers
194
Mini-cloze items
One problem with the cloze technique is that, once a passage has been
chosen, we are limited as to what features of language can be tested.
We can only test the grammatical structures and vocabulary that are to
be found in that passage. To get good coverage of the features that we
think are relevant, we have to include a number of passages, which is
hardly economical. An alternative approach is to construct what may be
called mini-cloze items. These may take various forms, but one that I
have used is an item that represents a brief exchange between two or
more people, with just one gap. For example:
A: What time is it?
B: ____________ a quarter to three.
A: You look tired.
B: Yes, I stayed _____________ really late last night. Had to
finish that book.
In this way, we can cover just the structures and vocabulary that we
want to, and include whatever features of spoken language are relevant
for our purpose. If, for example, we want to base the content of the test
on the content of the text books used in language schools, including a
representative sample of this is relatively straightforward. The one
possible disadvantage by comparison with more normal cloze is that the
context that must be taken into account in order to fill a gap correctly
is very restricted, but for such purposes as placement testing, this would
not seem a serious defect1.
The C-Test
The C-Test is really a variety of cloze, which its originators claim is
superior to the kind of cloze described above. Instead of whole words,
it is the second half of every second word that is deleted. An example
follows.
There are usually five men in the crew of a fire engine. One o___
them dri___ the eng___. The lea___ sits bes___ the dri___. The
ot___ firemen s___ inside t___ cab o___ the f___ engine. T___
leader kn___ how t___ fight diff___ sorts o___ fires. S___, when
t___ firemen arr___ at a fire, it is always the leader who decides
how to fight a fire. He tells each fireman what to do.
(Klein-Braley and Raatz 1984)
https://doi.org/10.1017/CBO9780511732980.015 Published online by Cambridge University Press
Testing overall ability
195
The supposed advantages of the C-Test over the more traditional cloze
procedure are that only exact scoring is necessary (native speakers
effectively scoring 100 per cent) and that shorter (and so more) passages
are possible. This last point means that a wider range of topics, styles,
and levels of ability is possible. The deletion of elements less than the
word is also said to result in a representative sample of parts of speech
being so affected. By comparison with cloze, a C-Test of 100 items takes
little space and not nearly so much time to complete (candidates do not
have to read so much text).
Possible disadvantages relate to the puzzle-like nature of the task. It
is harder to read than a cloze passage, and correct responses can often
be found in the surrounding text. Thus the candidate who adopts the
right puzzle-solving strategy may be at an advantage over a candidate of
similar foreign language ability. However, research would seem to indicate that the C-Test functions well as a rough measure of overall ability
in a foreign language. The advice given above about the development of
cloze tests applies equally to the C-Test.
Dictation
In the 1960s it was usual, at least in some parts of the world, to decry
dictation testing as hopelessly misguided. After all, since the order of
words was given, it did not test word order; since the words themselves
were given, it did not test vocabulary; since it was possible to identify
words from the context, it did not test aural perception. While it might
test punctuation and spelling, there were clearly more economical ways
of doing this.
At the end of the decade this orthodoxy was challenged. Research
revealed high correlations between scores on dictation tests and scores
on much longer and more complex tests (such as the UCLA ESLPE).
Examination of performance on dictation tests made it clear that words
and word order were not really given; the candidate heard only a stream
of sound which had to be decoded into a succession of words, stored,
and recreated on paper. The ability to identify words from context was
now seen as a very desirable ability, one that distinguished between
learners at different levels.
Dictation tests give results similar to those obtained from cloze tests.
In predicting overall ability they have the advantage of involving listening ability. That is probably the only advantage. Certainly they are as
easy to create. They are relatively easy to administer, though not as easy
as the paper-and-pencil cloze. But they are certainly not easy to score.
Oller, who was a leading researcher into both cloze and dictation,
https://doi.org/10.1017/CBO9780511732980.015 Published online by Cambridge University Press
Testing for language teachers
196
recommends that the score should be the number of words appearing in
their original sequence (misspelled words being regarded as correct as
long as no phonological rule is broken). This works quite well when
performance is reasonably accurate, but is still time-consuming. With
poorer students, scoring becomes very tedious.
Because of this scoring problem, partial dictation (see pages 168–169)
may be considered as an alternative. In this, part of what is dictated is
already printed on the candidate’s answer sheet. The candidate has
simply to fill in the gaps. It is then clear just where the candidate is up to,
and scoring is likely to be more reliable.
When using dictation, the same considerations should guide the choice
of passages as with the cloze procedure. The passage has then to be
broken down into stretches that will be spoken without a break. These
should be fairly long, beyond rote memory, so that the candidates will
have to decode, store, and then re-encode what they hear (this was a
feature of the dictations used in the research referred to above). It is
usual, when administering the dictation, to begin by reading the entire
passage straight through. Then the stretches are read out, not too slowly,
one after the other with enough time for the candidates to write down
what they have heard (Oller recommends that the reader silently spell the
stretch twice as a guide to writing time).
Mini-partial-dictation items
As far as coverage is concerned, dictation suffers from the same disadvantage as cloze. The passage determines the limits of what can be
tested. For that reason, a series of mini-dialogues (possibly mixed with
monologues) of the following kind can be constructed.
The candidate sees:
A: When can I see you again?
B: How about ……………………………….. Thursday?
And hears: | When can I see you again? How about a week on Thursday? |
Conclusion
There are times when only a general estimate of people’s overall language
ability is needed and backwash is not a serious consideration. In such
cases, any of the methods (the quick and dirty methods, some would
say) described in this chapter may serve the purpose. Because of the
https://doi.org/10.1017/CBO9780511732980.015 Published online by Cambridge University Press
Testing overall ability
197
considerable recent interest in cloze and its common use as a learning
exercise in its usual multi-item format, it may be the most obvious
choice. I suspect, however, that the simple gap filling methods referred
to above as ‘mini-cloze’ and ‘mini-partial-dictation’ will give results at
least as good as those of any other method.
Reader activities
1. Complete the three cloze passages in the chapter. Say what you think
each item is testing. How much context do you need to arrive at each
correct response?
If there are items for which you cannot provide a satisfactory
response, can you explain why?
Identify items for which there seem to be a number of possible
acceptable responses. Can you think of responses that are on the
borderline of acceptability? Can you say why they are on the borderline?
2. Choose a passage that is at the right level and on an appropriate
topic for a group of students with whom you are familiar. Use it to
create tests by:
● deleting every seventh word after a lead in;
● doing the same, only starting three words after the first deleted
word of the first version.
Compare the two versions. Are they equivalent?
Now use one of them to create a cloze test of the kind recommended.
Make a C-Test based on the same passage. Make a partial dictation
of it too. How do all of them compare?
If possible administer them to the group of students you had in mind,
and compare the results (with each other and with your knowledge
of the students).
Further reading
For all issues discussed in this chapter, including dictation, the most
accessible source is Oller (1979). The research in which the first cloze
passage in the chapter was used is described in Oller and Conrad
(1971). Chapelle and Abraham (1990) used one passage but different
methods of cloze deletion (including C-Test) and obtained different
results with the different methods. Brown (1993) examines the characteristics of ‘natural’ cloze tests and argues for rational deletion. Farhady
https://doi.org/10.1017/CBO9780511732980.015 Published online by Cambridge University Press
Testing for language teachers
198
and Keramati (1996) propose a ‘text-driven’ procedure for deleting
words in cloze passages. Storey (1997) investigates the processes that
candidates go through when taking cloze tests. Examples of the kind of
cloze recommended here are to be found in Cambridge Proficiency
Examination past papers. Hughes (1981) is an account of the research
into conversational cloze. Klein-Braley and Raatz (1984) and KleinBraley (1985) outline the development of the C-Test. Klein-Braley (1997)
is a more recent appraisal of the technique. Jafarpur (1995) reports
rather negative results for C-Tests he administered. Lado (1961) provides
a critique of dictation as a testing technique, while Lado (1986) carried
out further research using the passage employed by Oller and Conrad, to
cast doubt on their claims. Garman and Hughes (1983) provide cloze
passages for teaching, but they could form the basis for tests (native
speaker responses given). Hughes et al (1996, 1998) are placement tests
developed for ARELS (Association of Recognised English Language
Services) and based on mini-cloze and mini-partial–dictation items
(information on the Internet).
Answers to cloze tests
What is a college? The words deleted from the passage are as follows:
1. of; 2. a; 3. vocational; 4. is; 5. chief; 6. produce; 7. stamina; 8. mean;
9. with; 10. standards; 11. am; 12. believe; 13. and; 14. our; 15. moral;
16. for; 17. the; 18. and; 19. has; 20. our; 21. singularly; 22. who; 23.
a; 24. This; 25. a; 26. go; 27. and; 28. a; 29. citizenship; 30. aims; 31.
of; 32. our; 33. hesitate; 34. which; 35. to; 36. of; 37. seems; 38. should;
39. the; 40. else; 41. experience; 42. intellectual; 43. have; 44. as; 45. I;
46. feel; 47. me; 48. a; 49. student; 50. much.
Ecology. The words deleted from the passage are as follows: 1. on; 2. to;
3. one; 4. a; 5. If; 6. or; 7. it; 8. which/that; 9. his; 10. for; 11. be; 12.
wrong; 13. because; 14. part; 15. other; 16. choice/ option; 17. these;
18. do; 19. what; 20. have.
Family reunion. Acceptable responses: 1. well; 2. of; 3. they; 4. looks,
seems; 5. Did, Didn’t; 6. it; 7. I; 8. if; 9. when; 10. I; 11. you; 12. for;
13. not; 14. I’ve; 15. to; 16. Are; 17. want; 18. you; 19. that; 20. you;
21. off; 22. the; 23. what, one; 24. too; 25. You.
1. The fact that these ‘mini-cloze’ items are indistinguishable in form from gap
filling items presented in the previous chapter has not escaped me either!
https://doi.org/10.1017/CBO9780511732980.015 Published online by Cambridge University Press