Take corpus linguistics into your own hands with the Compleat Lexical Tutor

Peter Parise, Higashi Katsushika High School/ Matsudo High School


With corpus linguistics, most of the advances are in the development of dictionaries and textbooks,but in the literature regarding the use of corpora in the classroom, the picture does not seem so rosy. “Data-driven” learning, advocated by Tim Johns, is an inductive rather than deductive process where students learn the target language though analysis of examples, such as concordance lines derived from corpora. Johns comments that this is valuable for students, and that “research is too serious to be left to the researcher” (Johns, 1991).The hesitancy to embrace such a teaching approach may be two-pronged. First, Groom comments that teachers’ hesitation to embrace data-driven learning stems also from concern over the students’ response to using corpus tools in the classroom (2009). Second, Zhang (2008) states that the technical aspects such as terminology and software are unfamiliar to EFL teachers, making them reluctant to use corpora in their teaching practice.

The Compleat Lexical Tutor, or Lextutor, created by Tom Cobb of the University of Montreal, Quebec, is dedicated to “data driven learning on the web” (Cobb, 1997) and makes these practices accessible. The site also emphasizes the primacy of vocabulary by providing applications for testing, improving,and researching vocabulary learning.

Thesite provides resources not only for teaching English, but also French and Spanish. The welcome page presents three categories for use: a section for students called tutorial, an area for research, and tools for teachers. See Figure 1,below.

For students

The student section is devoted to offering tutorials, and data-driven tasks for students. The Corpus Grammar tool offers students the ability to check their “grammar intuition” (Cobb, 1997) with actual corpus findings. Near the top the student can choose a specific grammar problem in which to practice. The task is to evaluate sentence errors with the use of concordances and determine the correct usage based on the data. Students enter the correction which is checked by the site. Through this process, the student is encouraged to think inductively about how words are used based on the examples provided.

Tools for teachers

In the applications located here, the teacher supplies the texts and with these tools can create interactive activities for the students. One such tool is the I-D Word identification quiz which can develop vocabulary learning. In order to use this quiz, the teacher selects the needed word lists provided or inputs vocabulary relevant tostudents needs. The quiz presents a jumbled set of letters and at the bottom the student is presented with concordance lines with the missing word deleted. The student has to select the correct word out of the jumble which fits the meaning of the concordances. See Figure 3. The teacher has the ability to save a quiz on the site, which can be accessed by the student through a link.

The word lists available include a corpus taken from graded readers forthe first 1,000words and another for the second 1,000. An academic word list is also included,taken from the Brown corpus and the University Word List.

Other features in the Teachers section include a text-to-speech tool, in which the computer reads a text for the student, and the cloze builder, which can aid in creating cloze tests based on frequency lists.

Tools for researchers

The tools in this section are useful not only for conducting corpus-based research, but also in providinga resource for teachers. The concordancing program is valuable for teaching and research practice, because it provides access to corpora such as the British National Corpus, the Brown Corpus, and others. This is to cross-reference prescriptive grammar with actual usage in either written or spoken registers, particularly useful when I asked about the appropriate usage of a certain grammar point or vocabulary word. It is also a good way to find samples of actual language use rather than contrived examples. The concordancer is also noteworthy for other corpora such as a corpus of US TV and radio language, which can be used for investigating spoken registers. Other corpora which are quite handy are learner corpora, one of which is a corpus fromJapanese learners using English.

This article only presents a glimpse of what is available, so please visit the Compleat Lexical Tutor at <lextutor.ca> and experiment with each section to get a feel for the tools available. The reality is that corpus-based teaching is not as remote as it seems. It just means taking corpus linguistics into your own hands.


Cobb,T.(1997).The Compleat Lexical Tutor [website] University of Montreal, Quebec. Retrieved on September 10, 2009 from <lextutor.ca>.

Groom, N. (2009). Introducing corpora into the language classroom. The Language Teacher, 33(7), 26-28.

Johns, T. (1991).Should you be persuaded: Two examples of data-driven learning. In T. Johns & P. Kin (Eds.),Classroom concordancing (pp. 1-13). University of Birmingham, UK: Centre for English Language Studies.

Zhang, S. (2008).The necessities, feasibilities, and principles for EFL teachers to build a learner-oriented mini-corpus for practical classroom uses. Asian EFL Journal, Professional Teaching Journals, 29, 1-15. Retrieved on May 12, 2009 from <www.asian-efl-journal.com/pta_July_08_sz.php>.


Peter Parise teaches at three high schools in Chiba. His research interests include practical applications of learner corpora,and building corpora for research. You can follow his activities by visiting <www.tesolpeter.wordpress.com>.