Require Confidence Intervals for Effect Size Estimates in JALT Journal

Adam Lebowitz

In issue 38.4, I proposed that university Foreign Language Centers network for research collaboration. Methodologically, these networks could improve test power and help JALT as a national organization fulfill its primary purpose to “…(improve) language teaching and learning in Japan…” (emphasis added, Article 3 of the Constitution). Most empirical research is assumed significant for the whole population.  Therefore, the most important outcome should be effect size (as opposed to between group variance), and perhaps eta-squared, in particular since sample factors are “naturally occurring”—age, gender, international posture, etc.—for all students (Kline, 2004). 

However, until Language Center networks  are established, JALT Journal could adjust analytical practice to clarify “true” population scores by requiring confidence interval (CI) reporting around effect size results. Jacob Cohen (Cohen, 1994) advised 90% CI, and soon after the American Psychological Association Board of Scientific Affairs Task Force on Statistical Inference recommended (but not required) the same for its publications (Wilkinson & APA Task Force on Statistical Inference, 1999). Effect size CIs are a plausible requirement since JALT Journal follows APA guidelines. They also aid meta-analysis by inferring the importance of replicability (Steiger, 2004; Thompson, 2002). That is, “true” population parameters can only be realistically estimated through multiple, stable studies (Schmidt, 1996). Graphing effect size intervals also makes it easier to visualize where that “true” region lies among overlapping intervals.

Just as comparable effect size values within CIs could establish plausible parameters for a national population, they could show if differences between nationalities truly exist. For example, different correlations between international posture and the ideal self have been reported between Hungarian (r = .51) and Nihonjin  students (r = .43) (Kormos & Csizér, 2008; Yashima, 2009), converting to η2=.26 and η2= .19 respectively. However, how truly “different” are these two values? Without CIs it is impossible to see if parameters containing these values overlap, or not. 

Calculating effect size CIs may seem daunting because it is not an SPSS function. Instead, use the Methods for the Behavioral, Educational, and Social Sciences (MBESS) package from the open source R free-downloadable software <>.



Cohen, J. (1994). The Earth is round (p < .05). American Psychologist, 49(12), 997–1003.

Kormos, J., & Csizér, K. (2008). Age-related differences in the motivation of learning English as a foreign language: Attitudes, selves, and motivated learning behavior. Language Learning, 58(2), 327–355.

Schmidt, F. L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychological Methods, 1(2), 115–129.

Steiger, J. H. (2004). Beyond the F test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9(2), 164–182.

Thompson, B. (2002). What future quantitative social science research could look like: Confidence intervals for effect sizes. Educational Researcher, 31(3), 25–32.

Wilkinson, L. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54(8), 594–604.

Yashima, T. (2009). International posture and the ideal L2 self in the Japanese EFL context. In Z. Dornyei & E. Ushioda (Eds.), Motivation, language identity and the L2 self (pp. 144-163).  Bristol: Multilingual Matters.