Teaching the Test or Teaching the Language:
A Look at Test Preparation

Michael Narron, Kiyoshi Hirase, Taichiro Minami, Soichi Takekata, and Tetsuko Adachi

Miyazaki University Faculty of Education and Culture

Introduction

This paper describes the results of research undertaken to determine what difference, if any, exists between the test scores of students taught using standardized-test preparation (TP) materials and students taught using general English (GE) materials. Results showed that while the TP courses yielded higher scores, the difference between TP and GE course test scores did not exceed 0.9%.

Background and Rationale of this Study

The English department at Miyazaki University was asked to arrange a program that would enable students to better prepare for examinations designed to measure communicative competence, TOEIC, TOEFL, and the Eigo Kentei Shiken being the most common.

Debate over whether the English department should engage in instruction directed specifically toward TP began early in the planning phase. Some felt such instruction would yield practical benefits to students mindful of future employment or foreign study opportunities, an important consideration at a time when a dwindling birth rate means increased competition among schools with seats to fill. Others felt students would benefit more from GE instruction, expressing the oft-voiced opinion that students who are taught GE should acquit themselves well on any English test they might encounter.

While commentary criticizing direct TP exists, it is too limited and indirect to be used in argument. Imamura (1978) complains that even teachers well aware of modern teaching methods "seem reluctant to put their knowledge to practice, saying that if they practiced what they believed in, their students will not be well prepared to pass entrance exams" (p. 16). Ochiai (2000) cites interference with TP as one of several reasons Japanese teachers of English feel reluctant to utilize native English speakers in class. Spolsky (1995) comments that TP programs give more weight to the direction testing takes than they do to the contributions of linguistics and psychometric analysis.

Research on the effectiveness of direct TP is also fairly limited. Baretta (1992) has criticized gains studies, studies undertaken by companies involved in creating or delivering TP courses, complaining that neither the methods of study utilized nor the specific results are made public. Coomber (1997) carried out a study on a TP course administered by the International Development Program of Australian Universities and Colleges (IDP) for the International English Language Testing System (IELTS) which proved inconclusive due to "problems . . . which prevented us from obtaining sufficient data upon which we could base any truly valid conclusions" (p. 30). Robb and Ercanbrack (1999) studied the effect of direct TP on English major and non-English major students. Hypothesizing that score gains for all students would be the same regardless of method of study, the two researchers found that non-major students who had utilized TOEIC preparation material demonstrated significant gains, particularly on the reading section, over other non-majors who had utilized material not designed for TOEIC preparation. Of special interest in this report is the brief summary of research done on coaching for the SAT, or Scholastic Aptitude Test, the de facto university entrance examination for American high school students, which concludes that the meager gains produced by such coaching justifies neither the time nor the expense involved. Guest (2000) observed that university entrance exam preparation pedagogy in high schools actually fails to address the types of skills required to succeed in the entrance exams.

It is with an interest in determining with greater certainty whether students taught using GE materials would be placed at a disadvantage on tests of communicative competence that this study was undertaken. It is hoped that the results of this study will provide educators responsible for course design and textbook selection with useful information that will enable them to increase the quality of their programs.

The Study

Subjects

The subjects in this study were 894 first-year students from the Faculties of Agriculture, Education and Engineering enrolled in the Miyazaki University's General Education Communication English program.

Course Texts

Materials used in this study consisted of commercially available course texts in two general categories: material designed for TP and material designed for use in GE courses.

We define TP material as material specifically designed to provide instruction, practice questions or practice examinations. As there is little content or procedural variation among the various preparation texts available commercially, we decided not to place any restrictions on material selection beyond the requirement that TP course teachers use material specifically labeled as test preparation.

GE course material is material not specifically designed to prepare students for any of the standardized tests of communicative competence. Such material includes readers of the reading passage, short story or novel type that include exercises normally associated with such material, including content type questions, grammar exercises and listening practice. GE course material would also include conversation-based material or material compiled from various sources. GE course teachers were also free to select material they felt appropriate to their needs. This decision reflects the opinion that this study does not seek to examine difference among various program texts. Rather, it seeks to examine the difference between two concepts: teaching a test and teaching a language. We felt, therefore, that any material not specifically designed to prepare students for a test would be appropriate.

Tests

Two 7-year-old modified practice TOEFL examinations were used to evaluate student performance at the end of each semester. The modifications involved reducing content in order to accommodate testing period time limitations. Scoring was adjusted to produce standard scale TOEFL scores that could be used for comparison.

The approximately 115-minute TOEFL format was reduced to 60 minutes by halving the number of test items. We met the challenge of halving the five-essay reading comprehension section by selecting the shortest available essay for the third reading or editing a fuller essay to reduce content. Questions for the third reading were carefully deleted to assure a reasonably balanced level of difficulty. We accomplished this by removing an equal number of questions from each level of difficulty.

We adjusted point values by dividing the maximum possible score for each section of the unmodified practice TOEFL by the number of questions for each section of our modified exam and assigning the resulting value equally to each item.

Procedure

We assigned students randomly to class groups of 40. Each group was designated as TP or GE according to the textbook material selected by the individual instructor responsible for each group. Modified TOEFL examinations were given to students in both groups as semester final exams at the end of both the first and second semesters. We then regrouped students at the beginning of the second semester according to their scores on the first semester final examination. Because there were both TP and GE classes at each score-category level, we were able to assign students to these classes randomly. This regrouping made possible four course categories for study:

1st Sem	.	2nd Sem
TP	——	TP
TP	——	GE
GE	——	TP
GE	——	GE

Examinations were collected, scored and evaluated at the end of the first semester and at the end of the academic year.

Results

As can been seen in Figure 1 below, score differences ranged from a maximum increase of 23.4% to a maximum decrease of 19.3%, with the average difference between 1st and 2nd semester test scores falling within a range of 3.1% and 4.0% of the total possible test score. For the two groups with the greatest difference in content, TP-TP and GE-GE the average difference between 1st and 2nd semester scores was 0.9% of the total possible test score. The difference between the two mixed groups TP-GE and GE-TP was 0.8% of the total possible score.

Figure 1. Score differences

Graph 1

Conclusions and Discussion

Of immediate interest is the slight but clear tendency of TP group scores to be higher. The TP-TP group average score gain was the highest at 4.0% or 26.9 points. This represents a 6.1-point advantage over the GE-GE group average score of 20.8 points. Supporting this conclusion is the difference between the TP-GE and GE-TP scores. The TP-GE group engaged in test preparation activity during the first semester while the GE-TP group engaged in test preparation during the second semester. In this study, the second semester TP group scores exceeded those of the second semester GE group by 0.8% or 6.3 points.

Of practical interest is the fact that even the greatest difference in percentage gain represented a humble 0.9% or 6.1 points out of a possible 673 points. An ANOVA p value of 0.3 shows an almost indistinguishable difference among the four groups. This seems to indicate the absence of any marked practical benefit in coursework specifically tailored to test preparation, and supports the conclusion that general English instruction might be equally effective in raising scores on norm-referenced examinations.

Though further inquiry is necessary to more fully examine the implications of these findings, these results should encourage teachers who feel it inappropriate to teach for tests by assuring them that more traditional coursework does not interfere with test performance. These results might also translate into cost benefits for program planners who wish to re-evaluate the need for programs specifically tailored to test preparation.

Acknowledgement

The authors would like to thank Michael Guest of Miyazaki Medical College for his support and guidance on this paper.

References

Baretta, A. (1992). Evaluation of language education: An overview. In J. Alderson and A. Baretta (Eds.), Evaluating second language education. Cambridge: Cambridge University Press.
Coomer, J. (1997). Are test preparation programs really effective?: Evaluating an IELTS preparation course. Unpublished doctoral dissertation. University of Surrey, Guildford, United Kingdom.
Robb, T., & Ercanbrack, J. (1999). A study of the effect of direct test preparation on the TOEIC scores of Japanese university students. TESL-EJ 3(4). Retrieved November 12, 2002, from www-writing.berkeley.edu/TESL-EJ/ej12/a2.html.
Guest, M. (2000). But I have to teach grammar! The Language Teacher, 24(11), p.23-31.
Imamura, S. (1978). Critical views on TEFL: Criticism of TEFL in Japan. In Koike, I. (Ed.), The teaching of English in Japan. Tokyo: Eichosha.
Ochiai, N., (2000.) AET to JTE no kyoudou no genjou: Kasukabe shi no rei wo moto ni [The state of cooperation among AETs and JTEs as observed in Kasukabe City]. The Language Teacher, 24(8), 20-24.
Spolsky, B. (1995). Measured words. Oxford: Oxford University Press.

Teaching the Test or Teaching the Language: A Look at Test Preparation