Computer-Adaptive Testing of Listening
Comprehension: A Blueprint for CAT Development
by Patricia A. Dunkel
Georgia State University |
Return to The
Language Teacher Online
The use of computer-adaptive testing (CAT) for placement/achievement
testing purposes has become an increasingly appealing and efficient method
of assessment. Simply put, CAT saves testing time and decreases examinee
frustration since low-ability examinees are not forced to take test items
constructed for high-ability testees, and vise versa. As Bergstrom and Gershon
(1994, p. 1) note, "When the difficulty of items is targeted to the
ability of candidates, maximum information is obtained from each item for
each candidate, so test length can be shortened without loss of reliability."
A number of testing programs and licensing agencies in the United States
are switching from paper-and-pencil testing to CAT for the sake of efficiency
and effectiveness. The American College Testing program offers COMPASS,
a computer-adaptive placement test given to students entering college to
assess their preparedness to do college work in reading and math. A number
of high-stakes tests in the United States are also now administered in CAT
form (e.g., Educational Testing Service's Graduate Record Examination),
and those in charge of constructing the Test of English as a Foreign Language
(TOEFL) are in the process of developing aTOEFL CAT for administration to
various applicants taking the TOEFL in the twenty-first century. A number
of first- and second-generation second/foreign language CATs have been developed
in the United States fora variety of purposes, contents, and contexts, including:
French reading proficiency (Kaya-Carton, Carton & Dandonoli, 1991);
listening and reading comprehension (Madsen, 1991); English as a second
language (ESL)/bilingual entry- and exit-decision making (Stevenson, 1991);
ESL reading comprehension (Young, Shermis, Brutten, & Perkins, 1996),
and foreign language reading comprehension (Chaloub-Deville, in press; Chaloub-Deville,
Alcaya & Lozier, 1996).
Since CAT will, undoubtedly, become a more pervasive method of assessment
in the coming century, it behooves English-as-a-Foreign-Language (EFL) professionals
to become more knowledgeable about the capability and potential of computer-adaptive
testing. It also behooves them to explore developing and/or using CATs for
particular institutional testing purposes (e. g., for achievement and placement
testing). To construct valid and reliable CATs, some EFL professionals will
be forced by constraints of time and/or resources to create those CATs using
generic, commercial software (e. g., MicroCAT of Assessment Systems Corporation,
Inc., or CAT ADMINIStrATOR from Computer Adaptive Technologies, Inc.). Others
may choose to develop their own testing shells from scratch (perhaps the
best but most costly and time-consuming approach to CAT development) if
the commercial CAT software packages prove too costly for large-scale CAT
administrations or if the commercial software fails to meet particular testing
needs (e. g., to develop a CAT requiring video and speech output, or eventually
even speech recognition) .
It is the three-fold purpose of this article: (1) to familiarize EFL
teachers with some basic information about what a CAT is and how one operates;
(2) to describe the structure, content, and operation of the ESL listening
comprehension CAT; and (3) to acquaint those EFL professionals considering
CAT development de novo with the genesis, planning, and implementation
of the listening CAT software that drives the ESL listening CAT. To achieve
this final goal, the author describes both the inception and realization
of her CAT development project which was undertaken with funding from the
United States Department of Education, and with the support of instructional
designers and computer specialists at The Pennsylvania State University.
The author hopes to inspire readers to learn more about CAT, and to help
them decide whether they should use available, commercial CAT software programs
(e. g., MicroCAT) for their CAT development and administration, or whether
they should undertake creation of their own CAT software from scratch (no
pun intended). The information contained in this report may be able to serve
as a rough blueprint for those CAT developers.
What is CAT?
Computer-adaptive testing (CAT) is a technologically advanced method
of testing which matches the difficulty of test items to the ability of
the examinee. Essentially, CAT is tailored testing. That is, when an examinee
takes a CAT, the test questions/items received are "tailored"
to the listening ability of that particular examinee. This means that responses
to an earlier itemtaken in the CAT determine which ensuing items are presented
to the test taker by the computer. If an examinee gets an item taken in
the test correct, the next item received is more difficult; however, if
the examinee gets the item wrong, the next item received is easier. This
"adapting" procedure results in examinees taking more individualized
tests so that even in a large-scale testing situation in which 50 or more
people begin taking a CAT at the same time, it is very unlikely that any
of these examinees will take the exact same test.(1)
What is the purpose of the ESL listening CAT?
The purpose of the ESL Listening Comprehension Proficiency CAT is to
evaluate the nonparticipatory listening comprehension for general English
content of literate ESL examinees. The CAT provides a ranking of the examinees
nonparticipatory listening comprehension in terms of nine levels of ability
(novice-low; novice-intermediate; novice-high; intermediate-low; intermediate-mid;
intermediate-high; advanced; advanced plus; superior). These rankings might
be used for purposes of placement in (or out of) a variety of adult ESL
programs .
What is the structure and content of the ESL listening
CAT?
The CAT is designed to evaluate an examinee's ability to understand a
range of overheard utterances and discourse ranging from individual words/phrases
(e. g., at the novice level of listening ability), to short monologues and
dialogues on various topics, and then to longer and more involved dialogues
and monologues at the advanced through superior levels. The use of cultural
references is minimalized in the content of the novice-level items but is
used more and more in the items of intermediate- and advanced-levels of
difficulty. For example, authentic text from radio programs is included
in advanced-plus and superior items contain large doses of cultural material
(e. g., the superior listener must identify the theme of a country-western
song).
The items test four listener functions identified by Lund (1990, p. 107)
as being "important to second language instruction" and "available
to the listener regardless of the text": (1) recognition/identification;
(2) orientation; (3) comprehension of main ideas; and (4) understanding
and recall of details. It is expected that many additional listener functions
will be included in future iterations of the CAT, but the decision was made
initially to focus on creating items around these four listening functions
or tasks. Each listener function was embedded in a listening stimulus involving
a word or phrase (at the novice level) or a monologue or dialogue (at the
novice, intermediate, and advanced levels), and to a listener-examinee response
involving (1) a text option which required selection of one of two
limited-response options (at the novice level), one of three options (at
the intermediate level), or one of four options (at the advanced level);
(2) a still-photo-option (requiring selection of one of two pictures
(graphics), or (3) an element in a still-photo-option (requiring
selection of the correct response among two- or three-elements within a
unified photograph).
Items involving the four listener functions (identification, orientation,
main idea comprehension, and detail comprehension), the two
types of language (monologue and dialogue) and the three response formats
(text-, photo-, and element in a photo-options) were written
for each of the nine levels of listening proficiency articulated in the
ACTFL Listening Guidelines (novice-low, novice-mid, novice-high;
intermediate-low, intermediate-mid, intermediate-high; advanced, advanced
plus, superior). This approach to development of the 144-item bank was taken
to ensure that the test developers and potential users would have a clear
understanding of which types of language and listening tasks the items in
the pool were aiming to assess. The item-writing framework was also used
to guide the ESL specialists with item writing since they were not necessarily
specialists in testing and measurement theory and practice. The item writers
attempted to devise easier items at the novice level and more difficult
items at the advanced level. (2) Field-testing
of the items (see discussion below) provided evidence that the item writers
were not always "on target" when designating items to be "low,
mid, or high" levels within each of the categories of proficiency (e.g.,
novice, intermediate, or advanced). Still, it was thought that asking the
item writers to follow a clearly defined framework of content, listener
functions, types of languages, and examinee response formats as they began
construction of the item bank would allow them to use a more consistent
and enlightened approach to item writing.
The following discussion elaborates further upon the specific listener
functions (i. e., the test tasks) contained in the framework that guided
construction of the item bank.
Identification. According to Lund (1990), focusing on some aspect
of the code itself, rather than on the content of the message requires identification
which equates with terms such as recognition and discrimination.
According to Lund, identification "is particularly associated with
the novice level because that is all novices can do with some texts. But
identification can be an appropriate function at the highest levels of proficiency
if the focus is on form rather than content" (p. 107).
Orientation involves the listener's "tuning in" or ascertaining
the "essential facts about the text, including such message-externals
as participants, their roles, the situation or context, the general topic,
the emotional tone, the genre, perhaps even the speaker function" (p.
108). Determining whether one is hearing a news broadcast and that the news
involves sports is an example of an orientation task, according to Lund.
Main Idea Comprehension, involves "actual comprehension of
the message. Initially understanding main ideas depends heavily on recognition
of vocabulary. With live, filmed, or videotaped texts, the visual context
may also contribute heavily to understanding" (p. 108). "Deciding
if a weather reports indicates a nice day for an outing," or "determining
from a travelogue what countries someone visited" constitute examples
of main idea comprehension, according to Lund (p. 108).
Detail Comprehension items test the listener's ability to focus
on understanding specific information. According to Lund (1990), this function
"may be performed independently of the main idea function, as when
one knows in advance what information one is listening for; or the facts
can be details in support of main ideas" (p. 108). Lund's examples
of this listener function include: following a series of precise instructions;
getting the departure times and the platform numbers for several trains
to a certain city, and so on.
In addition to using the Lund taxonomy of listening functions listed
above, the item writers also attempted to use the ACTFL Listening
Guidelines' generic descriptions for listening in the process of
creating the 144 items. For example, the Guidelines describe novice-low
listening in the following terms: "Understanding is limited to occasional
words, such as cognates, borrowed words, and high frequency social conventions.
Essentially no ability to comprehend even short utterances." The item
writers attempted to keep this descriptor in mind when creating the initial
bank of items. For example, in the novice-low identification item,
the listener hears a single word "brother" spoken and is asked
identify the text equivalent on the computer screen by selecting one of
the following two text options: (a) "brother"; (b) "sister."
(This particular item is a text-response item.) Additional words (and cognates)
will be added as the item bank expands in number. The Guidelines
suggest that the novice-mid listener is able to understand some short learned
utterances, particularly where context strongly supports understanding and
speech is clearly audible. The novice listener comprehends some words and
phrases for simple questions, statements, high-frequency commands and courtesy
formulae about topics that refer to basic personal information or the immediate
physical setting. Items created with this particular Guidelines description
in mind required listeners to indicate comprehension of main ideas presented
in the monologues or dialogues heard.
How does the ESL listening CAT function?
After the examinee has completed an orientation to the test which teaches
her how to use the computerto answer sample questions, the CAT operates
as follows: The computer screen presents the answer choices when a question
is called for by the examinee, who clicks on the "Next Question"
icon to receive a test item (or another item). The examinee can take as
much time as she likes to read the text choices (or the photos/graphics)
and get ready to call for the listening stimulus. When ready to listen,
the examinee clicks on the "Listen" icon, which looks like a loudspeaker.
An alert asks the test taker to "listen carefully." The listening
stimulus (e.g., the dialog or monologue)is heard immediately thereafter.
(The alert and stimulus are played only when the examinee presses the loudspeaker
icon). The comprehension question follows as soon as the dialog/monologue
ceases, and the question is spoken by the same voice that provided the "listen
carefully" cue.
How was the ESL listening CAT created de novo?
It takes expertise, time, money, and persistence to launch and sustain
a CAT development project. Above all, it takes a lot of team work. The prototype
ESL listening CAT was, in fact, the product of team effort on the part of
many people with various areas of expertise: (a) ESL language specialists;
(b) authorities in the field of testing and measurement; (c) computer programmers;
(d) instructional-technology designers, (e) graduate research assistants;
and (f) ESL instructors and students. The ESL specialists wrote the test
items; the testing and measurement authorities provided guidance on test
design and data analysis, in addition to providing critiques of the individual
test items; the computer programmers and instructional-technology designers
created the computer software to implement the test design in computerized
form; the graduate student assistants did a variety of tasks from creating
the item graphics to supervising field testing (or trialing) of the questions
in the item bank; the ESL students field tested the 144 prototype items
in the item pool; and the ESL instructors offered their classes for trialing
of the CAT and provided feedback on the strengths and weaknesses of particular
items.
The ESL CAT was developed with the support and assistance of the Educational
Technologies Service, a unit within The Pennsylvania State University's
Center for Academic Computing. The extensive support offered by this organization
infused the project with the considerable expertise of Macintosh computer
programmers, experienced instructional designers and software developers,
as well as savvy graduate students in educational technology. A brief explanation
of how the project was initiated, planned and started, together with a brief
description of the actual programming environment should illuminate some
of the varied and complex aspects of the task.
Upon agreeing to take part in the project, the staff of the Educational
Technologies Services decided to use a systems approach to the design, development,
evaluation, and implementation of the CAT project. The phases of project
development included:
Project definition and planning. This phase included needs and
task analysis, goal-setting, defining the instructional solution and strategy,
determining evaluation methods, assigning personnel, reviewing the budget,
and determining technology tools and environment.
Design of a model section. This phase included planning screen
layout, instructional strategy, record keeping and reporting techniques,
and student assessment procedures.
Identification of all sections and/or modules. During this phase
the full scope of the project was planned.
Development and evaluation of model or prototype selected. It
was decided that the prototype would have full functionality and would contain
a bank of 144 items for initial trialing. Evaluation included several types
of formative evaluation including a questionnaire soliciting the sentiments
of a subsample of examinees who took the computer-assisted version of the
test concerning the design of the screens, the ease of use, the identification
of operating problems, and so forth.
Product design. This included content specification, content gathering,
analysis and sequencing of learning tasks, and storyboarding. The Educational
Technologies Service staff worked closely with the author of this report
in designing the testing software that would run the CATs.
Product development and evaluation. This included computer code
development, graphics and video/audio development, integration of all content,
review, revision, evaluation, optimization, and documentation.
Working with the staff of the Educational Technologies Service, CAT project
management activities included the following:
- Specifying the time line for development
- Selecting graduate student interns to work on the project (four graduate
students in the Penn State's Department of Instructional Systems assisted
with the project)
- Supervising part time assistants
- Securing copyright releases for the scanning of textbook photos for
ESL CAT and permission to photograph subjects for digitized representations
of various activities used in the ESL CAT
- Scheduling and leading meetings
- Coordinating the integration of the test components
- Documenting development processes
Early in the project, the author of this article (and the Lead Faculty
Member on the CAT development team) established the framework for test development
and item writing, identified the levels of listening comprehension and the
listener functions targeted, and decided upon the formats of the questions
that would be included in the initial item banks (see discussion below).
Some of the ensuing development and implementation procedures included the
following tasks:
- Developing prototype screen layouts which would be suitable for the
various question types;
- Maintaining clarity, conciseness, and consistency among the screens
and the test taking procedures;
- Deciding how students should "navigate" through the test;
- Developing appropriate graphics for the ESL CAT with scanning or creating
graphics using the MACDRAW program;
- Taking photographs with the Cannon XAP-SHOT image-digitizing camera
for inclusion in the ESL test;
- Touching up photos and graphics using graphics packages;
- Designing the title screen, introductory (orientation) screens, and
end of the test screen which reports the level of achievement;
- Designing and developing procedures to orient the student how to take
the test;
- Developing the computer programs to create and run the test (described
elsewhere);
- Assembling the test (this included using the test editor Educational
Technologies Service created to bring together all elements, including
the audio files, the graphics and photos, the text, and the item format);
- Supervising the field tests of the computer-assisted and the
computer-adaptive versions of the test in the Educational Technologies
Service lab;
- Maintaining the quality, accuracy, and consistency of the test items;
- Debugging technical problems related to the test creation and/or administration;
- Implementing data gathering for research (questionnaires were developed
to solicit test takers' reactions to the test design and item displays);
- Revising screen design as a result of formative evaluations;
- Revising items as necessary (i.e., re-inputting new text, audio, or
graphic and photos, as needed);
- Implementing and field testing the adaptive testing algorithm.
What is the programming environment of the ESL listening CAT like?
A brief discussion of the computer and the programming environment, as
well as the major components of the computer-adaptive test, follows:
1. The Hardware and Programming Environments. All parts of the
test were designed to run on an Apple Macintosh IIsi or other Macintosh
computers, running System 7.0 or later, with minimum of 5 megabytes of RAM
and access to a large amount of mass storage (either a local hard-drive
or access to an AppleShare server over ethernet).
The hardware configuration used to create and deliver the test consisted
of a Macintosh IIfx with 20 megabytes of RAM (used for programming) and
a Macintosh IIci with 8 megabytes of RAM (used for constructing, demonstrating,
and field testing of the CAT).
The following software was used to create and run the test: C++,
from Macintosh Programmer's Workshop. (It contains the normal set
of programming tools [compilers, linkers, assemblers, etc.], including the
C++ compiler); MacApp, an object-oriented application framework
designed by Apple; InsideOut by Sierra Software, a database engine
library.
2. The Penn State Computer-Adaptive Test Shell. The computer-adaptive
test is comprised of several components. This division allows the code to
be broken up into functional units, which makes debugging and extending
the code easier. The units are the Front End, the Test Manager,
and the Question Manager.
- The Front End includes the title screens, and some student information
screens which accept demographic input from the students. The student information
is stored so that it can be output with the test results, if it seems desirable
to do so.
- The Test Manager controls the administration of the test. Its
major job is to actually present the questions to the student and accept
the student's responses. Essentially, the Test Manager asks the
Question Manager for a question, then displays it according
to the type of question. When the student has given an answer, it will
inform the Question Manager of the student answer and ask for another
question.
- The Question Manager handles the selection of questions
and scoring. A sub-unit of this component also handles the storage of questions
and their components (text, pictures, sound names, etc.) using the InsideOut
database library. The Question Manager determines the question selection
algorithm. The current CAT algorithm used is one suggested by Henning (1987).
When the Question Manager determines that the test is over (when
the estimate of the student ability is accurate enough using the CAT algorithm),
it returns a NULL question to the Test Manager.
- The Test Editor. To edit and create the tests, the Test Editor
application (called TestEdit) is used. It allows an individual who
knows nothing about the technical details of file formats and/or programming
to create and edit the CAT, if it is appropriate for the individual to
do so (e.g., a designated test supervisor or someone who wishes to add
institution-specific items to the bank). A great deal of code is shared
between the Test Editor and the computer-adaptive test. This sharing
not only saves programming time but it also helps guarantee that the code
is bug-free.
- The Test Results Files. At the end of a test (if the test creator
so desires), there is a results file written to the hard disk. The file
is given a unique name so that students (or supervisors) may use it on
a shared network, if it is appropriate for them to do so. The file is a
standard ASCII text file and can be read by any Macintosh word processor,
or transferred easily to other computers. The first part of the file writes
out the student information, as entered by the student. The second part
writes out the student information for each question (one question per
line, items are separated by commas), including the following: Number of
lines in the Information section (including this one); Student Name; Student
ID; Student Birthdate; Starting Date; Starting Time; Total Duration of
the test (hh:mm:ss); Examination Center; How long the student has studied
English; How many years the student has lived in an English-speaking country;
the student's estimate of how well they speak English (the student's
estimate of how well they understand English; the student's estimate
of how well they read English; the student's estimate of how well
they write English.
The following information is provided, in addition: 1) number of questions
(administered for this test); 2) question information, one per line; each
question has the following fields, separated by commas: question order (the
index of the question); question ID (assigned by the editor); question type;
number of choices in question; the correct answer (1-4); the student's answer;
the number of times the student played the sound; question duration.
How was the initial item bank field tested (trialed)?
The ESL test was field tested using overhead transparencies of the CAT
screen displays and an audiotape of the dialogue and monologue stimuli and
the test questions. Two-hundred-fifty five subjects took part in the initial
field testing at two testing sites: Georgia State University and The Pennsylvania
State University. The field testing provided the necessary statistics (or
Item-Response Theory calibrations) for each test item. These statistics
(or item calibrations) are associated with each test item and reside in
the computeras part of the test-item information.
All 144 items in the bank were administered in linear fashion; all 72
items designated by the item writers to be novice-level items were administered
first, then the 47 intermediate-level items and finally the 23 advanced-level
items were administered to intact groups of examinees at Georgia State and
Penn State. On an overhead transparency, the field-test administrator displayed
each of the test-item's options, which consisted of text or visuals, on
an overhead transparency. Student viewed the answer choices, heard the audio
stimuli and the test question, and then registered their responses on a
computer answer sheet. Each transparency was placed on the overhead projector
while the examinees were registering their responses to a previous test
item on the computer answer sheet; this procedure allowed the subjects the
chance to view the options on the overhead transparency before they listened
to the stimuli. The reading of the test directions and the administration
of the item bank took approximately 90 minutes. The test directions were
not presented via audiotape but were read by the test administrator who
had the opportunity to answer any questions examinees had about the task,
the types of items, and the testing procedures.
Statistical analysis of the responses to the paper and pencil version
of the test yielded the IRT (Rasch) ability parameters (qs) that are being
used by the algorithm to drive the computer-adaptive version of the test.
(Unfortunately, a full discussion of the statistical analysis is not possible
within the scope of this paper.) The fact that the ability estimates were
gathered, to date, on a small sample of only 255 subject presents a distinct
problem when IRT analysis is used since when sample sizes are small, the
Rasch parameters can prove to be relatively unstable. Therefore, additional
field-testing is needed (and will be done) to establish more valid and stable
parameters if the test is to be implemented for placement or achievement
testing purposes in language programs
What is the next step?
Field-testing of the paper-and-pencil form of the original 144 items
must continue until a substantially large sample of examinees (500 to 1,000
subjects) is tested, and revision of the items in the bank by ESL and testing/measurement
specialists must also continue. In addition, The item bank also needs to
be expanded in number and variety of test items. Finally, the test must
be subjected to further reliability and validity studies in its adaptive
form, as well as its paper-and-pencil form. An English-for-academic purposes
(EAP) item banks should also be constructed so that the CAT can be used
for EAP testing purposes. The testing shell will permit creation of various
kinds of listening CATs for particular purposes (e.g., for helping select
those admissible to universities in Japan, etc.). Above all, the ESL CAT
development project should inspiremore teachers and researchers to begin
thinking about using and developing CATs for their own institution's assessment
purposes. However, we must be sure to recognize and acknowledge that CAT,
in and of itself, is no panacea or philosopher's stone of assessment. In
addition to being concerned about all the difficulties involved in learning
about computer adaptive testing, and finding out how (and whether) to use/create
a CAT, we must, above all, be concerned about developing valid, reliable,
and useful instruments, be they listening tests or others. We must, therefore,
be sure to recognize and agree that core principles of good test development
remain in full force, whether we are developing a CAT or a classroom exam
(see, for example, Bachman & Palmer, 1996; Buck, 1991; Dunkel, Henning
& Chaudron, 1993; Gorsuch & Griffee, 1997). Computerization is beginning
to open a whole new world of testing, but the world of the CAT developers
(and users) differs little from the world of the paper-and-pencil test developers
(and users) when it comes to theirabiding by the core principles of competent
language testing set forth by Bachman and Palmer (1996, p. 9), including
having:
- An understanding of the fundamental considerations that must be addressed
at the start of any languages testing effort, whether this involves the
development of new tests or the selection of existing language tests;
- An understanding of the fundamental issues and concerns in the appropriate
use of language tests;
- An understanding the fundamental issues, approaches, and methods used
in measurement and evaluation;
- The ability to design, develop, evaluate and use language tests in
ways that are appropriate for a given purpose, context, and group of test
takers;
- The ability to critically read published research in language testing
and information about published tests in order to make informed decisions.
The author hopes that she has given some insight into the item mentioned
first in this list of requirements. The rest is up to you, the readers,
and to those who will be the developers and users of CATs in the coming
century.
Notes
1. Such was the case when the author of
this report began in 1988-89 to think about creating a listening comprehension
CAT. At that time, commercial software CAT packages (e.g. MicroCAT) were
not equipped with an audio interface to provide a listening CAT so the author
had to begin a software development project that created a CAT able to interface
text and graphics/photographs with digitized speech in the test items. back
2. Work is presently underway at Georgia
State to create a bank of CAT items that assess students' preparedness to
be effective nonparticipatory listeners of English lectures at a university.
The English-for-academic (EAP) purposes listening CAT will evaluate examinees
listening comprehension in terms of these same nine levels of achievement
(novice-low through superior). back
3. The item writers' intuition and pedagogical
experience guided the construction of the initial pool of items. Item Response
Theory (IRT) statistical analysis was then used to check the level of difficulty
(or easiness) associated with each item. Since the sample size providing
the IRT item parameters was quite small (n=255), caution needs to be exercised
when interpreting the initial set of IRT parameters that drive selection
of test items for examinees. Copntinued field testing should help determine
whether the parameters are accurate or not. back
References
American Council on the Teaching of Foreign Languages.
(1986). ACTFL Proficiency Guidelines. Hastings-on-Hudson, NY. ACTFL.
Bachman, L., & Palmer, A. (1996). Language Testing
in Practice. New York: Oxford University Press.
Bergstron, B., & Gershon, R. (Winter 1994). Computerized
adaptive testing for licensure and certification. CLEAR Exam Review,
25-27.
Buck, G. (1991). The testing of listening comprehension:
An introspective study. Language Testing, 8, 67-91.
Chaloub-Deville, M. (in press). Important considerations
in constructing second language computer adaptive tests. Shiken, Japan:
The Japanese Association for Language Teaching.
Chaoub-Deville, M., Alcaya, C., & Lozier, V. (1996).
An operational framework for constructing a computer-adaptive test of L2
reading ability: Theoretical and practical issues. CARLA Working Paper
Series #1. Minneapolis, MN: University of Minnesota.
Dunkel, P., Henning, G., & Chaudron, C. (1993). The
assessment of a listening comprehension construct: A tentative model for
test specification and development. Modern Language Journal, 77,
180-191.
Dunkel, P. (1991). Computerized Testing of Nonparticipatory
L2 Listening Comprehension Proficiency: An ESL Prototype Development Effort.
Modern Language Journal, 75(1), 64-73.
Gorsuch, G., & Griffee, D. (1997). Creating and using
classroom tests. Language, 21, 27-31.
Hambleton, R. K., & Swaminathan, H. (1985). Item
Response Theory: Principles and Applications. Kluwer-Nijhoff.
Henning, G. (1987). A Guide to Language Testing: Development
and Evaluation. New York: Newbury House.
Hulin, C., Drasgow, F., & Parsons, C. (1983). (1983).
Item Response Theory: Application to Psychological Measurement. Dow-Jones
Irwin.
Lund, R. (1990). A taxonomy for teaching second language
listening. Foreign Language Annals, 23, 105-115.
Richards, J. C. (1983). Listening comprehension: Approach,
design, procedure. TESOL Quarterly, 17, 219-240.
Richards, J. C. (1985). The Context of Language Teaching.
New York: Cambridge Univ. Press.
Young, R., Shermis, M., Brutten, S., & Perkins, K.
(1996). From conventional to computer-adaptive testing of ESL reading comprehension.
System, 24, 23-40.
All
articles at this site are copyright © 1997 by their respective authors.
Document URL: http://www.jalt-publications.org/tlt/files/97/oct/dunkel.html
Last modified: October 19, 1997
Site maintained by TLT
Online Editor
|