How to Create Valid Reading Comprehension Tests to Measure Improvement

Date:

August 2025

Issue:

JALT Postconference Publication - Issue 2024.1; August 2025

https://doi.org/10.37546/JALTPCP2024-08

Page No.:

Writer(s):

Sachi Oshima, Chuo Gakuin University

Reference Data:

Oshima, S. (2025). How to create valid reading comprehension tests to measure improvement. In B. Lacy, M. Swanson, & P. Lege (Eds.), Moving JALT Into the Future: Opportunity, Diversity, and Excellence. JALT. https://doi.org/10.37546/JALTPCP2024-08

To measure improvement in students’ reading performance, teachers and researchers often administer multiple reading tests at different points during a course—typically a pretest, midterm, and posttest. Although using an identical test enables researchers to compare test scores obtained at multiple points directly, there is a “testing threat” that negatively affects validity (Trochim et al., 2016). If the same test is used repeatedly, students might improve their scores; however, this does not necessarily mean their reading performance has improved because they might remember the content of the pretest reading texts and items, lowering the difficulty of the posttest. Different tests consisting of different texts are expected to address this validity threat; still, it introduces another problem: if the reading passages vary considerably in difficulty, then the tests cannot validly measure whether students’ reading performance has actually improved. The purpose of this paper is to demonstrate a five-step solution to address this issue: (a) selecting the reading passages used for the tests, (b) analyzing and adjusting the lexical and readability level of the passages, (c) creating question items based on the difficulty level of questions (Burrows, 2012; Lumley, 1993), (d) conducting alpha and beta testing (Fulcher & Davidson, 2007), and (e) employing Rasch analysis to ensure comparable difficulty estimates among multiple reading tests.

生徒の英語リーディング（文章読解）力向上を測定するため、教師や研究者は複数回のテスト（コース開始時のpretest、中間のmidterm、終了時のposttestなど）を実施することが多い。同一のテストを複数回実施することで、スコアの直接比較は可能となるが、testing threat (Trochim et al., 2016)　が妥当性にもたらす影響を考慮する必要がある。同一のテストを繰り返し実施する場合、生徒のスコアは向上するかもしれないが、それが必ずしも生徒の英語リーディング力向上を意味するとは限らない。生徒がpretestの内容を記憶していることでposttestの難易度が下がることもあり得るからである。そこで妥当性を担保するため、異なる文章を用いたテストを準備することが望ましいが、また別の問題が生じる。文章の難易度がそもそも異なる場合、難易度の異なるテストを実施したところで、生徒のリーディング力向上を検証するのは妥当ではない。そこで、本稿では具体的な解決策として、5つのステップ—（a）テストに使用する文章の選定、（b）語彙・可読性のレベル分析・調整、（c）設問の難易度（Burrows, 2012; Lumley, 1993）を考慮した設問作成、（d）アルファテスト・ベータテスト（Fulcher & Davidson, 2007）の実施、（e）複数のテストが同等の難易度であることを担保するためのラッシュ分析実施—を紹介する。

PDF:

jalt2024-pcp-008.pdf

How to Create Valid Reading Comprehension Tests to Measure Improvement

Reference Data:

JALT Conference

Job Listings

Languages