A Meta-Analysis of Reliability Coefficients in Second Language Research

Ensuring internal validity in quantitative research requires, among other conditions, reliable instrumentation. Unfortunately, however, second language (L2) researchers often fail to report and even more often fail to interpret reliability estimates beyond generic benchmarks for acceptability. As a means to guide interpretations of such estimates, this article meta-analyzes reliability coefficients (internal consistency, interrater, and intrarater) as reported in published L2 research. We recorded 2,244 reliability estimates in 537 individual articles along with study (e.g., sample size) and instrument features (e.g., item formats) proposed to influence reliability. We also coded for the indices employed (e.g., alpha, KR20). The coefficients were then aggregated (i.e., meta-analyzed). The three types of reliability varied, with internal consistency as the lowest: median = .82. Interrater and intrarater estimates were substantially higher (.92 and .95, respectively). Overall estimates were also found to vary according to study and instrument features such as proficiency (low = .79, intermediate = .84, advanced = .89) and target skill (e.g., writing = .88 vs. listening = .77). We use our results to inform and encourage interpretations of reliability estimates relative to the larger field as well as to the substantive and methodological features particular to individual studies and subdomains. [ABSTRACT FROM AUTHOR]

[1]  S. Loewen,et al.  Statistical Literacy Among Applied Linguists and Second Language Acquisition Researchers , 2014 .

[2]  Luke Plonsky,et al.  How Big Is “Big”? Interpreting Effect Sizes in L2 Research , 2014 .

[3]  Deirdre J. Derrick,et al.  Instrument Reporting Practices in Second Language Research. , 2016 .

[4]  Tracy A. Lavin,et al.  A Systematic Review and Meta-Analysis of the Cognitive Correlates of Bilingualism , 2010 .

[5]  Sheila R. Brutten,et al.  An investigation of patterns of discontinuous learning: implications for ESL measurement , 1996 .

[6]  David Coniam Investigating the quality of teacher-produced tests for EFL students and the effects of training in test development principles and practices on improving test quality , 2009 .

[7]  Luke D Plonsky,et al.  An Assessment of Designs, Analyses, and Reporting Practices in Quantitative L2 Research , 2013 .

[8]  Patricia Snyder,et al.  Statistical Significance and Reliability Analyses in Recent Journal of Counseling & Development Research Articles , 1998 .

[9]  Craig Chaudron Progress in Language Classroom Research: Evidence from The Modern Language Journal, 1916‐2000 , 2001 .

[10]  S. Gass,et al.  Quantitative Research Methods, Study Quality, and Outcomes: The Case of Interaction Research , 2011 .

[11]  Dina Tsagari,et al.  Assessment Literacy of Foreign Language Teachers: Findings of a European Study , 2014 .

[12]  Cyril J. Weir,et al.  Language Testing and Validation , 2005 .

[13]  L. Cronbach,et al.  THEORY OF GENERALIZABILITY: A LIBERALIZATION OF RELIABILITY THEORY† , 1963 .

[14]  Tracy A. Lavin,et al.  Pedagogical strategies for teaching literacy to ESL immigrant students: a meta-analysis. , 2011, The British journal of educational psychology.

[15]  Denna L. Wheeler,et al.  A Reliability Generalization Meta-Analysis of Coefficient Alpha for the Maslach Burnout Inventory , 2011 .

[16]  Michael C. Rodriguez,et al.  Meta-analysis of coefficient alpha. , 2006, Psychological methods.

[17]  G. Feng Intercoder reliability indices: disuse, misuse, and abuse , 2014 .

[18]  Luke Plonsky,et al.  SYSTEMATIC REVIEW ARTICLE The Effectiveness of Second Language Strategy Instruction: A Meta-analysis , 2011 .

[19]  J. M. Cortina,et al.  What Is Coefficient Alpha? An Examination of Theory and Applications , 1993 .

[20]  Paul M. Muchinsky,et al.  The Correction for Attenuation , 1996 .

[21]  M. J. Subkoviak A Practitioner's Guide to Computation and Interpretation of Reliability Indices for Mastery Tests , 1988 .

[22]  Johanna E. Nilsson,et al.  Practices Regarding Reporting of Reliability Coefficients: A Review of Three Journals , 1999 .

[23]  R. Ellis MEASURING IMPLICIT AND EXPLICIT KNOWLEDGE OF A SECOND LANGUAGE: A Psychometric Study , 2005, Studies in Second Language Acquisition.

[24]  Luke D Plonsky,et al.  Reporting and Interpreting Quantitative Research Findings: What Gets Reported and Recommendations for the Field. , 2015 .

[25]  Junko Yamashita,et al.  L2 Reading Comprehension and Its Correlates: A Meta‐Analysis , 2014 .

[26]  Seonghoon Kim,et al.  Cronbach’s Coefficient Alpha , 2015 .

[27]  Kari J. Hodge,et al.  Interrater Reliability Estimators Commonly Used in Scoring Language Assessments: A Monte Carlo Investigation of Estimator Accuracy , 2014 .

[28]  Dan Brown The type and linguistic foci of oral corrective feedback in the L2 classroom: A meta-analysis , 2016 .

[29]  J. Norris,et al.  Effectiveness of L2 Instruction: A Research Synthesis and Quantitative Meta‐analysis , 2000 .

[30]  Tadayoshi Kaya,et al.  5. Effects of L2 instruction on interlanguage pragmatic development: A meta-analysis , 2006 .

[31]  Lyle F. Bachman Statistical analyses for language assessment , 2004 .

[32]  Luke Plonsky,et al.  The Effectiveness of Second Language Pronunciation Instruction: A Meta-analysis , 2015 .

[33]  Victor L. Willson,et al.  Research Techniques in AERJ Articles: 1969 to 1978 , 1980 .

[34]  Dan Douglas Performance consistency in second language acquisition and language testing research: a conceptual gap , 2001 .

[35]  Edward H. Haertel,et al.  4 Reliability Coefficients and Generalizability Theory , 2006 .

[36]  J. D. Brown,et al.  Language testing courses: What are they in 2007? , 2008 .

[37]  Guangchao Charles Feng,et al.  Factors affecting intercoder reliability: a Monte Carlo experiment , 2013 .