A structural equation modeling approach for examining position effects in large-scale assessments

Position effects may occur in both paper–pencil tests and computerized assessments when examinees respond to the same test items located in different positions on the test. To examine position effects in large-scale assessments, previous studies often used multilevel item response models within the generalized linear mixed modeling framework. Using the equivalence of the item response theory and binary factor analysis frameworks when modeling dichotomous item responses, this study introduces a structural equation modeling (SEM) approach that is capable of estimating various types of position effects. Using real data from a large-scale reading assessment, the SEM approach is demonstrated for investigating form, passage position, and item position effects for reading items. The results from a simulation study are also presented to evaluate the accuracy of the SEM approach in detecting item position effects. The implications of using the SEM approach are discussed in the context of large-scale assessments.

[1]  H. Akaike A new look at the statistical model identification , 1974 .

[2]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[3]  Neal M. Kingston,et al.  Item Location Effects and Their Implications for IRT Equating and Adaptive Testing , 1984 .

[4]  Jan de Leeuw,et al.  On the relationship between item response theory and factor analysis of discretized variables , 1987 .

[5]  Deborah J. Harris,et al.  Comparison of Item Preequating and Random Groups Equating Using IRT and Equipercentile Methods , 1990 .

[6]  Robert L. Brennan The Context of Context Effects , 1992 .

[7]  Fan Yang,et al.  Nonlinear structural equation models: The Kenny-Judd model with Interaction effects , 1996 .

[8]  R. P. McDonald,et al.  Test Theory: A Unified Treatment , 1999 .

[9]  Helfried Moosbrugger,et al.  Maximum likelihood estimation of latent interaction effects with the LMS method , 2000 .

[10]  A. Satorra,et al.  A scaled difference chi-square test statistic for moment structure analysis , 1999 .

[11]  Bengt,et al.  Latent Variable Analysis With Categorical Outcomes : Multiple-Group And Growth Modeling In Mplus , 2002 .

[12]  Herbert Hoijtink,et al.  The Best of Both Worlds: Factor Analysis of Dichotomous Data Using Item Response Theory and Structural Equation Modeling , 2003 .

[13]  R. MacIntosh,et al.  Variance Estimation for Converting MIMIC Model Parameters to IRT Parameters in DIF Analysis , 2003 .

[14]  Context Effects in Pretesting: Impact on Item Statistics and Examinee Scores. , 2003 .

[15]  Ina V. S. Mullis,et al.  Findings from IEA's Trends in International Mathematics and Science Study at the Fourth and Eighth Grades. TIMSS 2003 International Science Report. , 2004 .

[16]  H. Marsh,et al.  Structural equation models of latent interactions: evaluation of alternative estimation strategies and indicator construction. , 2004, Psychological methods.

[17]  An Investigation of Context Effects for Item Randomization within Testlets , 2004 .

[18]  P. Boeck,et al.  Explanatory item response models : a generalized linear and nonlinear approach , 2004 .

[19]  R. Brennan,et al.  Test Equating, Scaling, and Linking: Methods and Practices , 2004 .

[20]  Roger E. Millsap,et al.  Assessing Factorial Invariance in Ordered-Categorical Measures , 2004 .

[21]  Holmes Finch,et al.  The MIMIC Model as a Method for Detecting DIF: Comparison With Mantel-Haenszel, SIBTEST, and the IRT Likelihood Ratio , 2005 .

[22]  P. Ferrando,et al.  IRT-related factor analytic procedures for testing the equivalence of paper-and-pencil and Internet-administered questionnaires. , 2005, Psychological methods.

[23]  Irene R. R. Lu,et al.  Embedding IRT in Structural Equation Models: A Comparison With Regression Based on IRT Scores , 2005 .

[24]  Steven L. Wise,et al.  Response Time Effort: A New Measure of Examinee Motivation in Computer-Based Tests , 2005 .

[25]  T. Brown,et al.  Confirmatory Factor Analysis for Applied Research , 2006 .

[26]  Krista Breithaupt,et al.  Detecting Differential Speededness in Multistage Testing , 2007 .

[27]  Tim Moses,et al.  Using Kernel Equating to Assess Item Order Effects on Test Scores , 2007 .

[28]  Bengt O. Muthén,et al.  Quasi-Maximum Likelihood Estimation of Structural Equation Models With Multiple Interaction and Quadratic Effects , 2007 .

[29]  Lale Khorramdel,et al.  Examining item-position effects in large-scale assessment using the Linear Logistic Test Model , 2008 .

[30]  Herbert Matschinger,et al.  Estimation of item location effects by means of the generalized logistic regression model: a simulation study and an application , 2008 .

[31]  Examining Differential Item Functioning in Reading Assessments for Students with Disabilities: (642932011-001) , 2008 .

[32]  Akihito Kamata,et al.  A Note on the Relation Between Factor Analytic and Item Response Theory Models , 2008 .

[33]  Analyzing position effects within reasoning items using the LLTM for structurally incomplete data , 2008 .

[34]  Walter D. Way,et al.  Item Position and Item Difficulty Change in an IRT-Based Common Item Equating Design , 2008 .

[35]  Andreas Gold,et al.  The confirmatory investigation of APM items with loadings as a function of the position and easiness of items: A two-dimensional model of APM , 2009 .

[36]  A. Kelava,et al.  6. TESTING MULTIPLE NONLINEAR EFFECTS IN STRUCTURAL EQUATION MODELING: A COMPARISON OF ALTERNATIVE ESTIMATION APPROACHES , 2009 .

[37]  Eugenio Gonzalez,et al.  principles of multiple matrix booklet designs and parameter recovery in large-scale assessments , 2010 .

[38]  Kevin J. Grimm,et al.  Testing for Nonuniform Differential Item Functioning With Multiple Indicator Multiple Cause Models , 2011 .

[39]  Lale Khorramdel,et al.  Analysing item position effects due to test booklet design within large-scale assessment , 2011 .

[40]  Ets Gre Potential Impact of Context Effects on the Scoring and Equating of the Multistage GRE® Revised General Test , 2011 .

[41]  T. Davey,et al.  Potential Impact of Context Effects on the Scoring and Equating of the Multistage GRE® Revised General Test , 2011 .

[42]  Johannes Hartig,et al.  A multilevel item response model for item position effects and individual persistence , 2012 .

[43]  Yves Rosseel,et al.  lavaan: An R Package for Structural Equation Modeling , 2012 .

[44]  Investigating the Effect of Item Position in Computer-Based Tests. , 2012 .

[45]  Rianne Janssen,et al.  Modeling Item-Position Effects Within an IRT Framework , 2012 .

[46]  Multilevel Modeling of Item Position Effects , 2013 .

[47]  Davie Store Item parameter changes and equating: an examination of the effects of lack of item parameter invariance on equating and score accuracy for different proficiency levels , 2013 .

[48]  P. Ferrando,et al.  Combining IRT and SEM: A Hybrid Model for Fitting Responses and Response Certainties , 2013 .

[49]  U. Lorenzo-Seva,et al.  Unrestricted item factor analysis and some relations with Item Response Theory , 2013 .

[50]  Sebastian Weirich,et al.  Modeling Item Position Effects Using Generalized Linear Mixed Models , 2014 .

[51]  An Investigation of Position Effects in Large-Scale Writing Assessments , 2014 .

[52]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[53]  Fitting Structural Equation Mixture Models , 2015 .

[54]  Joshua N. Pritikin,et al.  Modular Open-Source Software for Item Factor Analysis , 2015, Educational and psychological measurement.

[56]  John Fox,et al.  Structural Equation Models , 2014 .

[57]  Cheng-Hsien Li Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares , 2016, Behavior research methods.

[58]  O. Bulut,et al.  Item and testlet position effects in computer-based alternate assessments for students with disabilities , 2016 .

[59]  Holger Brandt,et al.  Fitting Nonlinear Structural Equation Models in R with Package nlsem , 2017 .

[60]  O. Bulut,et al.  Multidimensional Extension of Multiple Indicators Multiple Causes Models to Detect DIF , 2017, Educational and psychological measurement.

[61]  Sebastian Weirich,et al.  Item Position Effects Are Moderated by Changes in Test-Taking Effort , 2017, Applied psychological measurement.