Does it 'Work'? Evaluating Tasks in Language
Teaching
Rod Ellis
Temple University |
A quick look at the published work on materials evaluation (e.g., Cunningsworth,
1984; Breen & Candlin, 1987; Skierso, 1991; McDonough & Shaw, 1993) reveals
that it is almost entirely concerned with predictive evaluation. That is,
it gives advice to teachers about how to conduct an evaluation of published
materials in order to determine whether the materials are suitable for a
given group of learners. This kind of evaluation is predictive in
the sense that it seeks to determine whether materials are likely to work
in a specific teaching context. Valuable as this kind of evaluation is,
it is not what I am concerned with here or in the workshop I will be giving
at the JALT International Conference in Hiroshima.
Instead, I want to consider how to carry out a retrospective evaluation
of teaching materials. That is, I want to address how teachers can determine
whether the materials they have actually used "work." It is my
guess that although teachers frequently do ask themselves whether the materials
they have selected or written work, they generally answer this question
impressionistically in the light of their day-by-day experiences of using
them. They rarely attempt a systematic and principled retrospective evaluation.
One obvious reason for this is the daunting nature of systematically
evaluating the use of a complete set of materials (e.g., a textbook). This
is an enormous undertaking, particularly if, as I shall shortly argue, the
evaluation is to involve some kind of attempt to discover what it is the
learners have learned as a result of using the materials. However, it may
be easier to carry out retrospective evaluations at the micro level by focusing
on whether specific teaching tasks work. My concern here, then, is with
task evaluations.
What does it mean to say a task "works"?
A good starting point for a retrospective micro-evaluation is to ask
what it means to say that a task has "worked." In fact, it can
mean a number of rather different things. First, teachers might feel that
a task has worked if they have evidence that the learners found it enjoyable
and useful. The evidence might take the form of the teacher noticing that
learners engage enthusiastically in performing the task, or it might take
the form of the students' responses to a post-task questionnaire designed
to elicit how useful they felt it was. This kind ofstudent-based evaluation
is common and is probably the basis for most teachers' judgements about
the effectiveness of a task (see Murphy, 1993 for an example of a student-based
task evaluation).
It is perfectly possible, however, that students enjoy doing a task and
give it positive ratings in a questionnaire, and yet fail to perform it
successfully and/or learn nothing from it. It is also necessary, therefore,
to consider two other types of retrospective evaluation: a response-based
evaluation and a learning-based evaluation.
Richards, Platt and Weber (1985) define a task as "an activity or
action which is carried out as a result of processing or understanding language
(i.e., as a response)" (p. 289). It follows that the effectiveness
of a task might be determined by examining whether the "response"
of the learners is the same as the task was designed to bring about. This
kind of evaluation constitutes a response-based evaluation.
A task may be more or less "closed" or more or less "open"
according to the type of response asked for. In the case of tasks calling
for verbal responses a fill-in-the-blanks grammar task can be considered
closed in the sense that there is only one set of right answers, while a
free composition task can be considered open. A non-verbal response may
also be closed (e.g., a listening task that requires learners to fill in
missing names on a map), or open (e.g., a listening task that asks learners
to read a story and draw a picture of what they think the main character
looks like). Now, it is obviously much easier to determine whether the response
learners make matches the one they were intended to make when the task is
a closed one. Thus, teachers might feel the closed grammar and listening
tasks outlined above have worked if they observe that the students have
filled in most of the blanks correctly and have been able to write down
the missing names on the map. It is much more difficult to decide whether
an open task has worked as this requires teachers to identify criteria to
evaluate whether the learners' responses are appropriate or not. For example,
the students' response to the free writing task would need to be evaluated
in terms of a set of criteria for effective writing (e.g., some kind of
analytic marking scheme). The picture-drawing task would need to be evaluated
in terms of the extent to which the students' pictures took account of the
textual clues regarding the nature of the main character.
Thus, whereas the criteria for the evaluation of a closed task are embedded
within the task itself, the criteria required for evaluating an open task
are not. They are external to the task and, because they are usually not
specified by the person who devised the task, they place a considerable
burden on teachers' shoulders. This burden is notable because, in accordance
with the dictums of communicative language teaching, many teachers are making
greater use of open tasks. It is my guess that many open tasks are evaluated
impressionistically. That is, teachers do not generally make explicit the
criteria they are using to determine whether the learners' responses are
effective or not.
Evaluating the effectiveness of a task in terms of whether the learners'
responses are correct or appropriate constitutes what I call an internal
evaluation. The evaluation is "internal" in the sense that
no attempt is made to ask whether the nature of the response required by
the learner is a valid one: the evaluator simply assumes that the response
required is valid, and tries to establish whether the learners' actual response
matches the response intended by the task.
Such an evaluation is, of course, limited because it is possible for
a response to be correct or appropriate but still not be valid. It might
be argued, for example, that a grammar task that requires learners to fill
in the blanks with correct grammatical forms does nothing to promote the
acquisition of these forms (see Krashen, 1982). It might also be argued
that having students write free compositions does little to improve their
writing skills. Furthermore, it is perfectly possible that a task fails
to produce the intended response in learners and yet contributes to their
development in some way (e.g., learners may fail to answer a set of comprehension
questions on a reading passage correctly and yet learn a number of new words
as a result of completing the task). In short, a task may be effective but
invalid or it may be ineffective and yet valid.
A full evaluation of a task, therefore, calls for an external evaluation.
It is possible to carry out an external evaluation theoretically (i.e.,
by determining whether the assumptions that task designers make when they
design specific tasks are justified in the light on some theory of language
acquisition or skill development). In this case, the evaluation is predictive
in nature. To evaluate a task retrospectively calls for investigating whether
a task actually results in any new language being learned or in the development
of some skill. This calls for a learning-based evaluation. It is, of course,
not easy to demonstrate that a task - whether closed or open - has contributed
to language learning. One way might be to ask learners to note down what
they think they have learned as a result of completing a task (see Allwright,
1984 for discussion of "uptake" as a measure of learning).
To sum up, I have suggested that determining whether a task works calls
for different kinds of retrospective evaluations. A student-based evaluation
provides information about how interesting and useful learners perceive
a task to be. A response-based evaluation is internal in nature because
it simply addresses the question "Was the students' response the one
intended by the designer of the task?" A learning-based evaluation
is external in nature because it goes beyond the task itself by trying to
determine whether the task actually contributed to the learners' second
language proficiency.
The different kinds of evaluations -- student-based, response-based and
learner-based -- call for different types of information and different instruments
for collecting them. A full description of these information types and instruments
is obviously needed but is not possible in this brief article.
Conclusion
The evaluation of language teaching materials has been primarily predictive
in nature and has focused on whole sets of materials. There is a need for
more thought to be given to how teachers can evaluate the materials they
use retrospectively on a day-by-day basis. I have suggested that this can
be best carried out as a series of micro-evaluations based on the concept
of "task." Such evaluations are likely to accord with teachers'
own ideas of what evaluation entails.
Widdowson (1990) has argued the need for "insider research,"
by which he means that teachers should engage actively in trying out and
evaluating pedagogic ideas in their own classrooms. Such "action research,"
he suggests, is essential to help teachers develop an increased awareness
of the different factors that affect teaching and learning in classrooms.
One way in which teachers can undertake insider research is by conducting
task evaluations.
Task evaluations, therefore, serve a double purpose. They help to determine
whether particular tasks work and, thereby, contribute to the refinement
of the tasks for future use but, perhaps more importantly, they engage teachers
as insider researchers and, thus, contribute to their on-going professional
development.
References
Allwright, R. (1984). Why don't learners learn what teachers
teach? - The interaction hypothesis. In D. Singleton & D. Little (Eds.).
Language learning in formal and informal contexts (pp. 3-18). Dublin:
IRAAL.
Breen, M., & Candlin, C. (1987). Which materials? A consumer's
and designer's guide. In L. Sheldon (Ed.). ELT textbooks and materials:
Problems in evaluation and development (pp. 13-28). ELT Documents 126.
Modern English Publications in Association with the British Council.
Cunningsworth, A. (1984). Evaluating and selecting ELT
materials. London: Heinemann.
Krashen, S. (1982). Principles and practice in second
language acquisition. Oxford: Pergamon.
McDonough, J., & Shaw, C. (1993). Materials and methods
in ELT. Oxford: Blackwell.
Murphy, D. (1993). Evaluating language learning tasks in
the classroom. In G. Crookes & S. Gass (Eds.). Tasks in a pedagogical
context: Integrating theory and practice (pp. 139-161). Clevedon, Avon:
Multilingual Matters.
Richards, J., Platt, J., & Weber, H. (1985). Longman
dictionary of applied linguistics. London: Longman.
Skierso, A. (1991). Textbook selection and evaluation.
In M. Celce-Murcia (Ed.). Teaching English as a Second or Foreign Language
(pp.432-453). Boston: Heinle and Heinle.
Widdowson, H. (1990). Pedagogic research and teacher education.
In H. Widdowson (Ed.). Aspects of Language Teaching (pp. 55-70).
Oxford: Oxford University Press.
Rod Ellis' workshop is sponsored by Longman Addison Wesley.
Article
copyright © 1996 by the author.
Document URL: http://www.jalt-publications.org/tlt/files/96/sept/eval.html
Last modified: March 25, 1977
Site maintained by TLT
Online Editor
|