The Language Teacher

Does it 'Work'? Evaluating Tasks in Language Teaching

Rod Ellis
Temple University

A quick look at the published work on materials evaluation (e.g., Cunningsworth, 1984; Breen & Candlin, 1987; Skierso, 1991; McDonough & Shaw, 1993) reveals that it is almost entirely concerned with predictive evaluation. That is, it gives advice to teachers about how to conduct an evaluation of published materials in order to determine whether the materials are suitable for a given group of learners. This kind of evaluation is predictive in the sense that it seeks to determine whether materials are likely to work in a specific teaching context. Valuable as this kind of evaluation is, it is not what I am concerned with here or in the workshop I will be giving at the JALT International Conference in Hiroshima.

Instead, I want to consider how to carry out a retrospective evaluation of teaching materials. That is, I want to address how teachers can determine whether the materials they have actually used "work." It is my guess that although teachers frequently do ask themselves whether the materials they have selected or written work, they generally answer this question impressionistically in the light of their day-by-day experiences of using them. They rarely attempt a systematic and principled retrospective evaluation.

One obvious reason for this is the daunting nature of systematically evaluating the use of a complete set of materials (e.g., a textbook). This is an enormous undertaking, particularly if, as I shall shortly argue, the evaluation is to involve some kind of attempt to discover what it is the learners have learned as a result of using the materials. However, it may be easier to carry out retrospective evaluations at the micro level by focusing on whether specific teaching tasks work. My concern here, then, is with task evaluations.

What does it mean to say a task "works"?

A good starting point for a retrospective micro-evaluation is to ask what it means to say that a task has "worked." In fact, it can mean a number of rather different things. First, teachers might feel that a task has worked if they have evidence that the learners found it enjoyable and useful. The evidence might take the form of the teacher noticing that learners engage enthusiastically in performing the task, or it might take the form of the students' responses to a post-task questionnaire designed to elicit how useful they felt it was. This kind ofstudent-based evaluation is common and is probably the basis for most teachers' judgements about the effectiveness of a task (see Murphy, 1993 for an example of a student-based task evaluation).

It is perfectly possible, however, that students enjoy doing a task and give it positive ratings in a questionnaire, and yet fail to perform it successfully and/or learn nothing from it. It is also necessary, therefore, to consider two other types of retrospective evaluation: a response-based evaluation and a learning-based evaluation.

Richards, Platt and Weber (1985) define a task as "an activity or action which is carried out as a result of processing or understanding language (i.e., as a response)" (p. 289). It follows that the effectiveness of a task might be determined by examining whether the "response" of the learners is the same as the task was designed to bring about. This kind of evaluation constitutes a response-based evaluation.

A task may be more or less "closed" or more or less "open" according to the type of response asked for. In the case of tasks calling for verbal responses a fill-in-the-blanks grammar task can be considered closed in the sense that there is only one set of right answers, while a free composition task can be considered open. A non-verbal response may also be closed (e.g., a listening task that requires learners to fill in missing names on a map), or open (e.g., a listening task that asks learners to read a story and draw a picture of what they think the main character looks like). Now, it is obviously much easier to determine whether the response learners make matches the one they were intended to make when the task is a closed one. Thus, teachers might feel the closed grammar and listening tasks outlined above have worked if they observe that the students have filled in most of the blanks correctly and have been able to write down the missing names on the map. It is much more difficult to decide whether an open task has worked as this requires teachers to identify criteria to evaluate whether the learners' responses are appropriate or not. For example, the students' response to the free writing task would need to be evaluated in terms of a set of criteria for effective writing (e.g., some kind of analytic marking scheme). The picture-drawing task would need to be evaluated in terms of the extent to which the students' pictures took account of the textual clues regarding the nature of the main character.

Thus, whereas the criteria for the evaluation of a closed task are embedded within the task itself, the criteria required for evaluating an open task are not. They are external to the task and, because they are usually not specified by the person who devised the task, they place a considerable burden on teachers' shoulders. This burden is notable because, in accordance with the dictums of communicative language teaching, many teachers are making greater use of open tasks. It is my guess that many open tasks are evaluated impressionistically. That is, teachers do not generally make explicit the criteria they are using to determine whether the learners' responses are effective or not.

Evaluating the effectiveness of a task in terms of whether the learners' responses are correct or appropriate constitutes what I call an internal evaluation. The evaluation is "internal" in the sense that no attempt is made to ask whether the nature of the response required by the learner is a valid one: the evaluator simply assumes that the response required is valid, and tries to establish whether the learners' actual response matches the response intended by the task.

Such an evaluation is, of course, limited because it is possible for a response to be correct or appropriate but still not be valid. It might be argued, for example, that a grammar task that requires learners to fill in the blanks with correct grammatical forms does nothing to promote the acquisition of these forms (see Krashen, 1982). It might also be argued that having students write free compositions does little to improve their writing skills. Furthermore, it is perfectly possible that a task fails to produce the intended response in learners and yet contributes to their development in some way (e.g., learners may fail to answer a set of comprehension questions on a reading passage correctly and yet learn a number of new words as a result of completing the task). In short, a task may be effective but invalid or it may be ineffective and yet valid.

A full evaluation of a task, therefore, calls for an external evaluation. It is possible to carry out an external evaluation theoretically (i.e., by determining whether the assumptions that task designers make when they design specific tasks are justified in the light on some theory of language acquisition or skill development). In this case, the evaluation is predictive in nature. To evaluate a task retrospectively calls for investigating whether a task actually results in any new language being learned or in the development of some skill. This calls for a learning-based evaluation. It is, of course, not easy to demonstrate that a task - whether closed or open - has contributed to language learning. One way might be to ask learners to note down what they think they have learned as a result of completing a task (see Allwright, 1984 for discussion of "uptake" as a measure of learning).

To sum up, I have suggested that determining whether a task works calls for different kinds of retrospective evaluations. A student-based evaluation provides information about how interesting and useful learners perceive a task to be. A response-based evaluation is internal in nature because it simply addresses the question "Was the students' response the one intended by the designer of the task?" A learning-based evaluation is external in nature because it goes beyond the task itself by trying to determine whether the task actually contributed to the learners' second language proficiency.

The different kinds of evaluations -- student-based, response-based and learner-based -- call for different types of information and different instruments for collecting them. A full description of these information types and instruments is obviously needed but is not possible in this brief article.

Conclusion

The evaluation of language teaching materials has been primarily predictive in nature and has focused on whole sets of materials. There is a need for more thought to be given to how teachers can evaluate the materials they use retrospectively on a day-by-day basis. I have suggested that this can be best carried out as a series of micro-evaluations based on the concept of "task." Such evaluations are likely to accord with teachers' own ideas of what evaluation entails.

Widdowson (1990) has argued the need for "insider research," by which he means that teachers should engage actively in trying out and evaluating pedagogic ideas in their own classrooms. Such "action research," he suggests, is essential to help teachers develop an increased awareness of the different factors that affect teaching and learning in classrooms. One way in which teachers can undertake insider research is by conducting task evaluations.

Task evaluations, therefore, serve a double purpose. They help to determine whether particular tasks work and, thereby, contribute to the refinement of the tasks for future use but, perhaps more importantly, they engage teachers as insider researchers and, thus, contribute to their on-going professional development.

References

Allwright, R. (1984). Why don't learners learn what teachers teach? - The interaction hypothesis. In D. Singleton & D. Little (Eds.). Language learning in formal and informal contexts (pp. 3-18). Dublin: IRAAL.

Breen, M., & Candlin, C. (1987). Which materials? A consumer's and designer's guide. In L. Sheldon (Ed.). ELT textbooks and materials: Problems in evaluation and development (pp. 13-28). ELT Documents 126. Modern English Publications in Association with the British Council.

Cunningsworth, A. (1984). Evaluating and selecting ELT materials. London: Heinemann.

Krashen, S. (1982). Principles and practice in second language acquisition. Oxford: Pergamon.

McDonough, J., & Shaw, C. (1993). Materials and methods in ELT. Oxford: Blackwell.

Murphy, D. (1993). Evaluating language learning tasks in the classroom. In G. Crookes & S. Gass (Eds.). Tasks in a pedagogical context: Integrating theory and practice (pp. 139-161). Clevedon, Avon: Multilingual Matters.

Richards, J., Platt, J., & Weber, H. (1985). Longman dictionary of applied linguistics. London: Longman.

Skierso, A. (1991). Textbook selection and evaluation. In M. Celce-Murcia (Ed.). Teaching English as a Second or Foreign Language (pp.432-453). Boston: Heinle and Heinle.

Widdowson, H. (1990). Pedagogic research and teacher education. In H. Widdowson (Ed.). Aspects of Language Teaching (pp. 55-70). Oxford: Oxford University Press.

Rod Ellis' workshop is sponsored by Longman Addison Wesley.