Effect Size vs. Regression: Some Discourse Issues

Adam Lebowitz

In TLT 39/2, I suggested JALT Journal follow American Psychological Association recommendations and report effect size confidence intervals (CI) in research results. Compared in replication studies and meta-analyses, they can provide the closest approximation of a population’s “true” scores. Outside of a system promoting empirical collaborations across regions, this is probably the easiest way to get the “bird’s-eye view” of a target population and fulfill JALT’s mandate as a national organization.

This suggestion, however, presupposes a more basic issue: How to measure results, through effect size indices or regression coefficients? I am firmly in the effect size camp, and prefer to control co-variants rather than do multiple regression. I believe results are much easier to understand, but another reason is more “discursive”: While (to me at least) effect size shows what is happening, regression coefficients, and models in general, encourage predictions: i.e., what should be happening. This is very seductive for PhD students and other young researchers trying to make a mark with high R2. Indeed, model fit is extremely useful when examining construct validity. 

Unfortunately, where desire for good-fitting models is too strong (e.g., insisting on .95+ GFI), the flipside is the undesirability of bad-fitting models. Therefore, we should remember what Bentler and Bonnett really said about indices such as Tucker-Lewis and NFI:  <.90 is not bad per-se, but just requires improvement. More importantly, researchers are “unnecessarily dejected by their inability to account for every bit of sample variation” (Bentler & Bonnett, 1980, p. 604). Since acceptable assessment methods and cut-offs are still under discussion (Lance, Butts, & Michels, 2006), we should remember life does not always cooperate with models, nor in general with statistics.

Therefore, putting too much stock in modeling may encourage prediction rather than observation and discovery. At worst, a “wrong model” carries the risk of overlooking something else going on (the Type II error). Naturally, good research methods do everything to avoid this through power analysis and less stringent probability alpha.

Overall, we should acknowledge when creating models that their overuse in education and the behavioral sciences (where ESL resides) could create a mindset that is anti-empirical. Nothing dulls the good sense of a teacher more.


Bentler, P. M., & Bonnett, D. G. (1980). Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin, 88(3), 588-606.

Lance, C. E., Butts, M. M., & Michels, L. C. (2006). The sources of four commonly reported cutoff criteria: What did they really say? Organizational Research Methods, 9(2), 202-220. doi:10.1177/1094428105284919