Page No.: 
28
Writer(s): 
Sachi Oshima, Chuo Gakuin University

This article was originally published in the Selected Papers section of the 2024 Postconference Publication (PCP), Moving JALT Into the Future: Opportunity, Diversity, and Excellence. The PCP publishes papers based on presentations given at the JALT International Conference, and the Selected Papers section highlights a small number of papers of exceptional quality that have been first suggested by the editorial staff and then vetted by the JALTPublications Board through a blind review process. We feel that papers like this one represent some of the best work that the JALT Conference and the PCP have to offer, and encourage interested readers to check out other selected papers at https://jalt-publications.org/proceedings.

Reference Data: 

Oshima, S. (2025). How to create valid reading comprehension tests to measure improvement. In B. Lacy, M. Swanson, & P. Lege (Eds.), Moving JALT Into the Future: Opportunity, Diversity, and Excellence. JALT. 

 

To measure improvement in students’ reading performance, teachers and researchers often administer multiple reading tests at different points during a course—typically a pretest, midterm, and posttest. Although using an identical test enables researchers to compare test scores obtained at multiple points directly, there is a testing threat that negatively affects validity (Trochim et al., 2016). If the same test is used repeatedly, students might improve their scores. However, this does not necessarily mean their reading performance has improved because they might remember the content of the pretest reading texts and items, lowering the difficulty of the posttest. Different tests consisting of different texts are expected to address this validity threat. Still, it introduces another problem: If the reading passages vary considerably in difficulty, then the tests cannot validly measure whether students’ reading performance has actually improved. The purpose of this paper is to demonstrate a five-step solution to address this issue: (a) selecting the reading passages used for the tests, (b) analyzing and adjusting the lexical and readability level of the passages, (c) creating question items based on the difficulty level of questions (Burrows, 2012; Lumley, 1993), (d) conducting alpha and beta testing (Fulcher & Davidson, 2007), and (e) employing Rasch analysis to ensure comparable difficulty estimates among multiple reading tests.

生徒の英語リーディング(文章読解)力向上を測定するため、教師や研究者は複数回のテスト(コース開始時のpretest、中間のmidterm、終了時のposttestなど)を実施することが多い。同一のテストを繰り返し実施する場合することで、スコアの直接比較は可能となるが、testing threat (Trochim et al., 2016) が妥当性にもたらす影響を考慮する必要がある。同一のテストを複数回実施することで、生徒のスコアは向上するかもしれないが、それが必ずしも生徒の英語リーディング力向上を意味するとは限らない。生徒がpretestの内容を記憶していることでposttestの難易度が下がることもあり得るからである。そこで妥当性を担保するため、異なる文章を用いたテストを準備することが望ましいが、また別の問題が生じる。文章の難易度がそもそも異なる場合、難易度の異なるテストを実施したところで、生徒のリーディング力向上を検証するのは妥当ではない。そこで、本稿では具体的な解決策として、5つのステップ—(a)テストに使用する文章の選定、(b)語彙・可読性のレベル分析・調整、(c)設問の難易度(Burrows, 2012; Lumley, 1993)を考慮した設問作成、(d)アルファテスト・ベータテスト(Fulcher & Davidson, 2007)の実施、(e)複数のテストが同等の難易度であることを担保するためのラッシュ分析実施—を紹介する。

 

PDF: