Assessing the Hypothesis of Measurement Invariance in the Context of Large-Scale International Surveys

In the field of international educational surveys, equivalence of achievement scale scores across countries has received substantial attention in the academic literature; however, only a relatively recent emphasis on scale score equivalence in nonachievement education surveys has emerged. Given the current state of research in multiple-group models, findings regarding these recent measurement invariance investigations were supported with research that was limited in scope to few groups and relatively small sample sizes. To that end, this study uses data from one large-scale survey as a basis for examining the extent to which typical fit measures used in multiple-group confirmatory factor analysis are suitable for detecting measurement invariance in a large-scale survey context. Using measures validated in a smaller scale context and an empirically grounded simulation study, our findings indicate that many typical measures and associated criteria are either unsuitable in a large group and varied sample-size context or should be adjusted, particularly when the number of groups is large. We provide specific recommendations and discuss further areas for research.

[1]  F. Lord Applications of Item Response Theory To Practical Testing Problems , 1980 .

[2]  Filip Lievens,et al.  Measurement equivalence in the conduct of a global organizational survey across countries in six cultural regions , 2007 .

[3]  Dorothy T. Thayer,et al.  Differential Item Performance and the Mantel-Haenszel Procedure. , 1986 .

[4]  H. Swaminathan,et al.  Detecting Differential Item Functioning Using Logistic Regression Procedures , 1990 .

[5]  Ronald K. Hambleton,et al.  Identifying the causes of DIF in translated verbal items , 1999 .

[6]  Gordon W. Cheung,et al.  Evaluating Goodness-of-Fit Indexes for Testing Measurement Invariance , 2002 .

[7]  J. Horn,et al.  When is invariance not invarient: A practical scientist's look at the ethereal concept of factor invariance. , 1983 .

[8]  R. Bagozzi Structural Equation Models in Experimental Research , 1977 .

[9]  W. Cunningham Issues in factorial invariance. , 1991 .

[10]  F. Chen Sensitivity of Goodness of Fit Indexes to Lack of Measurement Invariance , 2007 .

[11]  G. Lubke,et al.  Can Unequal Residual Variances Across Groups Mask Differences in Residual Means in the Common Factor Model? , 2003 .

[12]  T. Little Mean and Covariance Structures (MACS) Analyses of Cross-Cultural Data: Practical and Theoretical Issues. , 1997, Multivariate behavioral research.

[13]  J. H. Steiger Statistically based tests for the number of common factors , 1980 .

[14]  G. J. Mellenbergh Contingency Table Models for Assessing Item Bias , 1982 .

[15]  P. Bentler,et al.  Comparative fit indexes in structural models. , 1990, Psychological bulletin.

[16]  P. Bentler,et al.  Cutoff criteria for fit indexes in covariance structure analysis : Conventional criteria versus new alternatives , 1999 .

[17]  Phillip W. Braddy,et al.  Power and sensitivity of alternative fit indices in tests of measurement invariance. , 2008, The Journal of applied psychology.

[18]  W. Holmes Finch,et al.  Confirmatory Factor Analytic Procedures for the Determination of Measurement Invariance , 2006 .

[19]  B. Muthén,et al.  Applying Multigroup Confirmatory Factor Models for Continuous Outcomes to Likert Scale Data Complicates Meaningful Group Comparisons , 2004 .

[20]  R. Hambleton,et al.  Adapting educational and psychological tests for cross-cultural assessment , 2004 .

[21]  W. Meredith Measurement invariance, factor analysis and factorial invariance , 1993 .

[22]  Richard P. DeShon,et al.  Measures are not invariant across groups without error variance homogeneity , 2004 .

[23]  James A. Wollack,et al.  A Comparison of Item Response Theory and Observed Score DIF Detection Measures for the Graded Response Model. , 1998 .

[24]  K. G. J8reskoC,et al.  Simultaneous Factor Analysis in Several Populations , 2007 .

[25]  Peter M. Bentler,et al.  EQS : structural equations program manual , 1989 .

[26]  Andreas Ritter,et al.  Structural Equations With Latent Variables , 2016 .

[27]  Disentangling Sources of Differential Item Functioning in Multilanguage Assessments , 2002 .

[28]  Roger E. Millsap,et al.  Assessing Factorial Invariance in Ordered-Categorical Measures , 2004 .

[29]  William D. Schafer,et al.  measurement and evaluation in counseling and Development , 2013 .

[30]  R. Vandenberg,et al.  A Review and Synthesis of the Measurement Invariance Literature: Suggestions, Practices, and Recommendations for Organizational Research , 2000 .

[31]  N. Schmitt,et al.  Measurement invariance: Review of practice and implications , 2008 .

[32]  P. Bentler,et al.  Significance Tests and Goodness of Fit in the Analysis of Covariance Structures , 1980 .

[33]  A. Grisay,et al.  Measuring the Equivalence of Item Difficulty in the Various Versions of an International Test. , 2007 .

[34]  Gregory R. Hancock,et al.  Structural equation modeling methods of hypothesis testing of latent variable means. , 1997 .

[35]  R. Hambleton,et al.  Detecting potentially biased test items : Comparison of IRT area and Mantel-Haenszel methods , 1989 .

[36]  J. Horn,et al.  A practical and theoretical guide to measurement invariance in aging research. , 1992, Experimental aging research.

[37]  John Cresswell,et al.  PISA 2009 Technical Report , 2012 .

[38]  Matthias von Davier,et al.  Investigation of model fit and score scale comparability in international assessments , 2011 .

[39]  Wolfgang Dür,et al.  Cross-National Measurement Invariance of the Teacher and Classmate Support Scale , 2010, Social indicators research.

[40]  Ronald K. Hambleton Adapting achievement tests into multiple languages for international assessments , 2002 .