Most of us in second language teaching who have dealt with corpus linguistics should now be aware that this area of study has great potential for the field of SLA research. But how many of us know exactly what corpus linguistics is, and what it does for us? For those interested, the Learner Corpus Workshop, which was held at Showa Women's University on October 6 and 7, 2001, was a perfect place to become enlightened in this area. The workshop, organized by a group of Japanese researchers specializing in learner corpus research, was intended to meet the increasing demand from those who want to learn about corpus linguistics and its practical applications to SLA research. This was the second workshop since the first one was held in 1999, and this year some 50 people from all over Japan attended both days of the workshop.
The program for October 6 started with plenary speeches made by Professor Rod Ellis (Auckland University and a visiting professor at Showa Women's University) and by a Longman representative who spoke on behalf of Mr. Andrew Tope (Longman). Following this were presentations given by six researchers in Japan who actually used learner corpora data for their research projects. On October 7, three workshops were offered to teach basic skills and knowledge on how to use computer software for dealing with corpora data, and each participant chose one among the three courses: Excel/Word, Perl, or WordSmith.
One of the highlights occurred at the very beginning of the event. In his plenary speech titled Real Data and Real Pedagogy, Ellis, with his ample experience and knowledge in the field of SLA, gave us insightful suggestions on the use of learner corpus for SLA. His lecture centered around two questions: (a) What kind of corpora should serve as the basis for designing a second/foreign language course?; and (b) How should the results of corpus analysis be applied to the design of second/foreign language courses? In answering the first question, he argued that comparative analyses of native speaker and learner corpora are ideally required. He also suggested that the corpora of native speaker language use with learners might be highly useful as it provides information about the kinds of language use that L2 learners experience at different stages of their development. In response to the second question, Ellis proposed that corpus-based analyses be best exploited through consciousness-raising (CR) tasks. He pointed out that a benefit of corpora data is that it demonstrates problematicity of some target linguistic feature not only through learners' errors (which can be observed rather easily without corpora data), but also through learners' avoidance (which is gained only by comparing native speaker and learner corpora).
With all the expectation for possibilities and benefits of corpora data in his speech, however, it was interesting to notice that Ellis repeatedly mentioned the limits of corpus linguistics in language pedagogy. One of the points he made was that corpora can only assist in the design of courses by stipulating "what" is to be taught, but they can say nothing about the methodology of language teaching (i.e. "how" to teach). He also warned that even in selecting "what" to teach, we should not rely too much on frequency analyses provided by corpora data, because there is a good chance that learners will learn high frequency items anyway. Quoting from Cook (1996), Ellis mentioned "the leap from linguistics to pedagogy is far from straightforward," and repeatedly emphasized the importance of combining corpus linguistic research with SLA research. He also articulated the importance of teachers' intuition for filling the gap between linguistics and pedagogy.
What followed this insightful speech was also worth listening to. The Longman representative's introduction to the explosion of new words in the English language was astounding. He showed examples of new words in English vocabulary such as "kidult" (an adult who likes to play games or buy things that most people consider more suitable for children), or "screenager" (a young person who spends a lot of time using computers and the Internet), and explained the new ways in which new words are formed. Following this, six presentations about newly conducted research using corpora data took place in two rooms. Their topics ranged from analyses of Japanese learners' data in terms of written style to an introduction to error annotation tools.
The next day was spent on the acquisition of new skills which we hoped would make ourselves a brand new "corpus linguist." Among the three workshops that were offered, I attended the workshop for WordSmith. It is a commercial concordancer that allows you to conduct a variety of analyses. If you have your students' data in this software, for example, you can instantly make a word list of order of frequency, analyze the data according to some keyword to find collocation patterns, or focus on key linguistic items to find frequent error patterns. If you have other data such as an English textbook on your computer, you can easily compare it with your students' data. It took us a whole day to acquire basic skills, but it gave all of us satisfaction to think that this investment would broaden the possibility of our research options and save us a lot of time carrying them out in the future. The only concern for me now is whether I will remember all the knowledge I crammed into my head so that I can actually use it.
In closing, I would like to reiterate what I understood during the workshop: Corpus linguistics has too great a potential for anybody in language teaching to ignore. It can give you access to millions of words of corpus data from your home computer so that you can personalize it for your own use. But like most modern technologies, its benefits may not be truly appreciated until you have used it. Those who are interested in exploring this new field should attend the next Learner Corpus Workshop which is scheduled to take place in June, 2002.
Cook, G. (1998). The uses of reality: A reply to Ronald Carter. ELT Journal, 52(1), 57-63.