Equivalence of Reading and Listening Comprehension Across Test Media

Whether an ability test delivered on either paper or computer provides the same information is an important question in applied psychometrics. Besides the validity, it is also the fairness of a measure that is at stake if the test medium affects performance. This study provides a comprehensive review of existing equivalence research in the field of reading and listening comprehension in English as a foreign language and specifies factors that are likely to have an impact on equivalence. Taking into account these factors, comprehension measures were developed and tested with N = 442 high school students. Using multigroup confirmatory factor analysis, it is shown that reading and listening comprehension both were measurement invariant across test media. Nevertheless, it is argued that equivalence of data gathered on paper and computer depends on the specific measure or construct, the participants or the recruitment mechanisms, and the software and hardware realizations. Therefore, equivalence research is required for specific instantiations unless generalizable knowledge about factors affecting equivalence is available. Multigroup confirmatory factor analysis is an appropriate and effective tool for the assessment of the comparability of test scores across test media.

[1]  Richard P. DeShon,et al.  Measures are not invariant across groups without error variance homogeneity , 2004 .

[2]  R. P. McDonald,et al.  Test Theory: A Unified Treatment , 1999 .

[3]  Amery D. Wu,et al.  Decoding the Meaning of Factorial Invariance and Updating the Practice of Multi-group Confirmatory Factor Analysis: A Demonstration With TIMSS Data , 2007 .

[4]  Michael J. Zickar,et al.  THE PEN‐BASED COMPUTER AS AN ALTERNATIVE PLATFORM FOR TEST ADMINISTRATION , 1996 .

[5]  R. Vandenberg,et al.  A Review and Synthesis of the Measurement Invariance Literature: Suggestions, Practices, and Recommendations for Organizational Research , 2000 .

[6]  Hong Jiao,et al.  Comparability of Computer-Based and Paper-and-Pencil Testing in K–12 Reading Assessments , 2008 .

[7]  Douglas F. Becker,et al.  The Score Equivalence of Paper-and-Pencil and Computerized Versions of a Speeded Test of Reading Comprehension , 2002 .

[8]  G. Neuman,et al.  Computerization of Paper-and-Pencil Tests: When are They Equivalent? , 1998 .

[9]  R. Hoyle Statistical Strategies for Small Sample Research , 1999 .

[10]  O. Wilhelm,et al.  Testing Reasoning Ability with Handheld Computers, Notebooks, and Paper and Pencil , 2010 .

[11]  Alija Kulenović,et al.  Standards for Educational and Psychological Testing , 1999 .

[12]  Gitta H. Lubke,et al.  On the relationship between sources of within- and between-group differences and measurement invariance in the common factor model , 2003 .

[13]  H. Huynh,et al.  Computer-Based and Paper-and-Pencil Administration Mode Effects on a Statewide End-of-Course English Test , 2008 .

[14]  Mary Pommerich,et al.  Developing Computerized Versions of Paper-and-Pencil Tests: Mode Effects for Passage-Based Tests , 2004 .

[15]  J. Laborda,et al.  Building a Validity Argument for the Test of English as a Foreign Language , 2009 .

[16]  L. Angeles Evaluating Cutoff Criteria of Model Fit Indices for Latent Variable Models with Binary and Continuous Outcomes , 2002 .

[17]  W. Meredith Measurement invariance, factor analysis and factorial invariance , 1993 .

[18]  D. Eignor The standards for educational and psychological testing. , 2013 .

[19]  Bengt,et al.  Latent Variable Analysis With Categorical Outcomes : Multiple-Group And Growth Modeling In Mplus , 2002 .

[20]  H. Leeson The Mode Effect: A Literature Review of Human and Technological Issues in Computerized Testing , 2006 .

[21]  William Revelle,et al.  Cronbach’s α, Revelle’s β, and Mcdonald’s ωH: their relations with each other and two alternative conceptualizations of reliability , 2005 .

[22]  Gordon W. Cheung,et al.  Evaluating Goodness-of-Fit Indexes for Testing Measurement Invariance , 2002 .

[23]  Julio Olea,et al.  Psychometric and Psychological Effects of Item Selection and Review on Computerized Testing , 2003 .

[24]  Thomas Hoffmann,et al.  Examining the Effect of Computer-Based Passage Presentation on Reading Test Performance , 2005 .

[25]  Jaeyool Boo,et al.  Comparability of a paper-based language test and a computer-based language test , 2003 .

[26]  David Coniam,et al.  Evaluating computer-based and paper-based versions of an English-language listening test , 2006, ReCALL.

[27]  Bengt Muthén,et al.  Simultaneous factor analysis of dichotomous variables in several groups , 1981 .

[28]  A. Rupp,et al.  Developing standards-based assessment tasks for English as a first foreign language : context, processes, and outcomes in Germany , 2008 .

[29]  Brent Bridgeman,et al.  Effects of Screen Size, Screen Resolution, and Display Rate on Computer-Based Test Performance , 2001 .

[30]  André Beauducel,et al.  On the Performance of Maximum Likelihood Versus Means and Variance Adjusted Weighted Least Squares Estimation in CFA , 2006 .

[31]  P. Bentler,et al.  Cutoff criteria for fit indexes in covariance structure analysis : Conventional criteria versus new alternatives , 1999 .

[32]  Barbara M Byrne,et al.  Measurement equivalence: a comparison of methods based on confirmatory factor analysis and item response theory. , 2002, The Journal of applied psychology.

[33]  Carolyn S. Clausing,et al.  Paper versus CRT--Are Reading Rate and Comprehension Affected?. , 1990 .

[34]  Roger E. Millsap,et al.  Assessing Factorial Invariance in Ordered-Categorical Measures , 2004 .

[35]  M. Eid,et al.  Separating trait effects from trait-specific method effects in multitrait-multimethod models: a multiple-indicator CT-C(M-1) model. , 2003, Psychological methods.