Dimensions in the Diversity of Language:
A Language Testing Perspective
Don Porter
Centre for Applied Language Studies, University
of Reading, England |
Return to The Language
Teacher Online
Diversity in language performance
Bachman (1990) draws powerful attention to the instability of learner
behaviour on different tests of the same skill, e.g.:
Some test takers, for example, may perform better in the context of
an oral interview than they would sitting in a language laboratory speaking
into a microphone in response to statements and questions presented through
a pair of earphones. And individuals who generally perform well in oral
interviews may find it difficult to speak if the interviewer is someone
they do not know... 'Live' versus recorded presentation of aural material,
personality of examiner, filling in the blanks of isolated sentences...are
but a few examples of the ways in which the methods we employ in language
tests can vary. (p.111)
The result, it is implied, is that two tests which are purportedly measuring
the same linguistic ability, but by different methods, may differ in the
account they give of an individual test taker's linguistic ability-- a situation
which ought to give rise to concern (see also Negishi, 1996, with respect
to reading). Until relatively recently, surprisingly little attention was
paid to this phenomenon, although it is widely attested in the informal
comments of teachers and testers alike, and has the potential to seriously
distort attempts at interpreting test results. As is so often the case in
language testing, the explanation may lie in the fact that the domain to
be tested is frequently not defined with any precision. Teachers, consumers
of test results, and often testers themselves rest contented with such vague
and general concepts as 'reading comprehension', 'oral proficiency', etc.,
while at least some of the features noted by Bachman as varying from test
method to test method, and as eliciting diverse performances from a test
taker, are part of the normal conditions of natural language use. Such features
-- it would seem reasonable to suggest -- need to be systematically built
into test specifications. On the other hand, features of test tasks which
affect learner performance but which are not normal conditions of natural
language use need to be controlled for or eliminated. Following Guttman
(1970), Bachman refers to such characterising features of test tasks as
'test method facets'.
Sources of diversity in performance on language tests
Of course, no test will have perfect reliability, so even if a single
very good test is given to two language learners who have equal ability,
simple measurement error will ensure that the results will probably not
be absolutely identical. So when one learner takes the same language test
twice, the sum of the measurement errors is likely to produce even greater
differences in the assessment. Measurement error is known to arise from
misleading prompts, errors in the test key, ambiguities in instructions,
etc., as well as from unpredictable and predictable features of the test
taker (Kunnan, 1995). It is obvious that every effort should be made to
eliminate potential test-based sources of unreliability in test-taker performance,
as these will lead to inaccuracy in assessment.
Similarly, lack of validity in one or both tests, or different interpretations
of validity in the sense that competing 'models' of the ability in question
form the bases for the tests, may produce markedly different assessments
of a learner's ability. To avoid lack of validity, every effort must be
made in the process of test-construction and development to ensure both
that the test is based on an adequate theoretical model of linguistic ability,
and that the test is itself an adequate embodiment of that model. Differences
in assessment of language ability which stem from inadequacies in the underlying
linguistic model, or in the incorporation of that model in a test, must
be regarded as error.
However, in the case of tests satisfactorily based on competing but reasonable
linguistic models, perhaps capturing different insights into the nature
of the ability being measured, and thus having what we might call competing
validities, some differences in the eventual assessment are to be expected,
and should not be ascribed to measurement error. Users of a test should
be made aware that tests may differ in their approach to the assessment
of linguistic ability, and that the test they are using has its own special
focusses and characteristics. We have to accept however that the finer points
of the theoretical bases of a test will often be beyond most test users.
As mentioned at the beginning of this paper, in recent years attention
has increasingly been paid to the effects on learner-performance of 'test
method facets', as discussed in Bachman (1990). Attention was drawn to the
fact that some of these facets are peculiar to language tests (e.g. speaking
to a microphone and responding to pre-recorded utterances presented over
head-phones; filling in blanks in isolated sentences), while others are
a natural part of normal every-day language use (e.g. speaking to a 'live'
person and responding to spontaneous utterances; speaking to both known
and unknown people; speaking to people with evidently different personalities).
What Bachman and others do not make clear is that (a) facets peculiar to
language tests which affect test performance are undesirable, and their
effects should be minimised if outright elimination is not possible, while
(b) facets which affect test performance and which are a natural part of
normal language use are desirable in test methods, or even requirements
if methods are to be fully valid.
Implications for testing
In this section we consider the general implications for testing of the
diversity of method facets found in normal language use. We then consider
some specific implications of the gender of the interlocutor in interview
tests, and of mutual acquaintanceship of participants in pair-tasks. Finally,
we consider implications of addressee age in letter-writing tasks. The intention
of the discussion is less to focus on the specific facets involved, and
more to consider issues raised when these facets are built into the test
design.
General implications: Research into the effects of test method
facets is still in its infancy. Research into those facets which (a) significantly
affect foreign language performance, and (b) are a natural part of normal
language use, is embryonic. Candidates for inclusion in this latter category
do however suggest that while the systematic inclusion of such facets would
substantially enrich the validity of tests, it could simultaneously make
them more complex and more time-consuming to administer or to take, more
complex to report, and more difficult to report and interpret.
Let us take as an example Bachman's entirely plausible suggestion that
the personality of the interlocutor in an interview test might affect the
performance of the test-taker. It is possible that where the interlocutor
and the test-taker have similar personalities, the test-taker's performance
will be enhanced, but where the personality-types differ, performance will
be weakened. To be fair to all, then, the personalities of all concerned
would need to be assessed, and each test taker would need to be interviewed
twice - in two comparable but not identical interviews, of course - once
by a similar-personality interlocutor, and once by a different-personality
interlocutor. The question would then arise: Should each of the two performance-types
be reported separately, as representing two separate sub-types of oral proficiency,
or should oral ability be represented by the average of the two performances?
The latter would be more practical, of course -- but which would be the
more valid?
As so often in life, the solution would doubtless need to be some form
of compromise, in which as much of the diversity implied by the test method
facet would be included as was compatible with a practical test.
The gender of the interlocutor: Research with learners from many
different cultural backgrounds ( O'Sullivan & Porter, 1996; Porter & Shen
Shu-Hung, 1991) indicates that the gender of the interlocutor is almost
always a significant method facet. While learners from some cultures perform
better when the interlocutor is a man, it is usually the case that learners
of either gender perform better when their interlocutor is a woman: It seems
that women from many cultures tend to use language in interaction in a more
facilitative way. The issues here, then, are directly comparable to those
described in relation to the hypothetical case of interlocutor personality.
It would seem that, where possible, students should interact with both a
male and a female interlocutor.
Mutual acquaintanceship: In a study of the effect of learner-acquaintanceship
(Japanese students) on pair-task performance, O'Sullivan and Porter (1997)
found some indication that mutual acquaintanceship might have a beneficial
effect on the performance of students at higher levels of proficiency. The
reasonably practical implication might be that in pair-tasks students should
always be placed in acquaintance-pairs, as even at lower proficiency levels
no actual impairment of performance would result.
Addressee age: O'Sullivan and Porter (1995) found that Japanese
learner-writers consistently produced better quality writing when writing
to someone identified as being older than themselves. This clearly implies
the importance for the learner of having a specified reader: A generalised
writing task may well not elicit the student's best performance.
Conclusion
The incorporation in test tasks of a degree of naturalness in the form
of facets from normal language use would seem to be both desirable and feasible.
Bibliography
Bachman, L. (1990). Fundamental
considerations in language testing. Oxford: Oxford University Press.
Guttman, L. (1970). Integration of test design and analysis.
In Proceedings of the 1969 Invitational Conference on Testing Problems.
Princeton, NJ: Educational Testing Service.
Kunnan, A.J. (1995). Test taker characteristcs and test
performance. Cambridge: Cambridge University Press.
Negishi, M. (1996). Unpublished PhD thesis: University
of Reading.
O'Sullivan, B., & Porter, D. (1995). The importance of
audience age for learner-speakers and learner-writers from different cultural
backgrounds. Paper presented at the RELC conference, Singapore.
O'Sullivan, B., & Porter, D. (1996). Speech style, gender
and oral proficiency interview performance. Paper presented at the REL:C
conference, Singapore.
O'Sullivan, B., & Porter, D. (1997). The effect of learner
acquaintanceship on pair-task performance. Paper presented at the RELC conference,
Singapore.
Porter, D., & Shen Shu-Hung. (1991). Gender, status and
style in the interview. The Dolphin 21. Aarhus University Press.
Don Porter's workshop is sponsored by the Centre
for Applied Language Studies, University of Reading.
All
articles at this site are copyright © 1997 by their respective authors.
Document URL: http://www.jalt-publications.org/tlt/files/97/oct/porter.html
Last modified: October 19, 1997
Site maintained by TLT
Online Editor
|