PISA test format assessment and the local independence assumption

Large-scale assessments of reading comprehension, notably OECD’s Programme for International Student Achievement (PISA) and IEA’s Progress in Reading Literacy Study (PIRLS), generally use paper-and-pencil tests in which a reading passage, with different questions based on it, is presented to the student. The PISA mathematics and science literacy tests also consist of a hierarchically embedded structure stimulus. In these surveys, cognitive data are scaled according to an item response theory (IRT) model. One of the cornerstones of standard IRT models is the assumption of local item independence (LII). Because multiple items are connected together to a common passage, items within a unit are not likely to be conditionally independent, which means that the independence assumption might be violated. In the first part of this study, Yen’s Q 3 statistic was used to evaluate the importance of the local item dependency (LID) effect with respect to PISA 2000 and PISA 2003 data. The consequences of the violation of the LII assumption on the student performance distribution were then explored. Moderate but clear global context dependencies were detected in a large number of the PISA reading and mathematics units. Some reading and mathematics units showed additional significant pairwise local dependencies. Further, LID impacted on the variability of the student proficiencies, and the bias in the variability estimate strongly correlated with average country performance. Therefore, the consequence of LII violation in PISA is that the relative variability of low-performing countries is overestimated while the relative variability of high-performing countries is underestimated.

[1]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .

[2]  P. Boeck,et al.  Modeling local item dependencies in item response theory , 1998 .

[3]  David Andrich,et al.  9 – A Latent-Trait Model for Items with Response Dependencies: Implications for Test Construction and Analysis* , 1985 .

[4]  Wen-Chung Wang,et al.  The Rasch Testlet Model , 2005 .

[5]  Robert E. Fay,et al.  THEORY AND APPLICATION OF REPLICATE WEIGHTING FOR VARIANCE CALCULATIONS , 2002 .

[6]  David A. Walker,et al.  The IEA six subject survey: An empirical study of education in twenty-one countries , 1976 .

[7]  Brian Habing,et al.  Conditional Covariance-Based Nonparametric Multidimensionality Assessment , 1996 .

[8]  Wen-Chung Wang,et al.  Exploring Local Item Dependence Using a Random-Effects Facet Model , 2005 .

[9]  F Tuerlinckx,et al.  The effect of ignoring item interactions on the estimated discrimination parameters in item response theory. , 2001, Psychological methods.

[10]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[11]  Kentaro Yamamoto,et al.  ESTIMATING THE EFFECTS OF TEST LENGTH AND TEST TIME ON PARAMETER ESTIMATION USING THE HYBRID MODEL , 1995 .

[12]  Li Cai,et al.  A Two-Tier Full-Information Item Factor Analysis Model with Applications , 2010 .

[13]  David Thissen,et al.  Trace Lines for Testlets: A Use of Multiple-Categorical-Response Models. , 1989 .

[14]  Wen-Chung Wang,et al.  Local Item Dependence for Items Across Tests Connected by Common Stimuli , 2005 .

[15]  Wen-Chung Wang,et al.  Effects of Ignoring Item Interaction on Item Parameter Estimation and Detection of Interacting Items , 2007 .

[16]  M. Davier,et al.  Modeling Nonignorable Missing Data with Item Response Theory (IRT). Research Report. ETS RR-10-11. , 2010 .

[17]  Paul De Boeck,et al.  The Random Weights Linear Logistic Test Model , 2002 .

[18]  Steffen Brandt Estimation of a Rasch model including subdimensions , 2008 .

[19]  Edward H. Ip,et al.  Empirical Bayes and Item-Clustering Effects in a Latent Variable Hierarchical Model , 2002 .

[20]  P. Boeck,et al.  Estimation of the MIRID: A program and a SAS-based approach , 2003, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[21]  Howard Wainer,et al.  Item Clusters and Computerized Adaptive Testing: A Case for Testlets , 1987 .

[22]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[23]  John Cresswell,et al.  Assessing scientific, reading and mathematical literacy : a framework for PISA 2006 , 2006 .

[24]  S. Embretson Test design : developments in psychology and psychometrics , 1985 .

[25]  K F Rust,et al.  Variance estimation for complex surveys using replication techniques , 1996, Statistical methods in medical research.

[26]  Wendy M. Yen,et al.  Scaling Performance Assessments: Strategies for Managing Local Item Dependence , 1993 .

[27]  Christine Y. O'Sullivan,et al.  TIMSS 2007 Assessment Frameworks. , 2005 .

[28]  D. Andrich A rating formulation for ordered response categories , 1978 .

[29]  EXPLORING LOCAL ITEM DEPENDENCY FOR ITEMS CLUSTERED AROUND COMMON READING PASSAGE IN PIRLS DATA , 2010 .

[30]  Yong-Won Lee Examining passage-related local item dependence (LID) and measurement construct using Q3 statistics in an EFL reading comprehension test , 2004 .

[31]  Eric T. Bradlow,et al.  A General Bayesian Model for Testlets: Theory and Applications , 2002 .

[32]  D. Thissen,et al.  Local Dependence Indexes for Item Pairs Using Item Response Theory , 1997 .

[33]  S. Embretson,et al.  Item response theory for psychologists , 2000 .

[34]  W. Stout Psychometrics: From practice to theory and back , 2002 .

[35]  F. Samejima Estimation of latent ability using a response pattern of graded scores , 1968 .

[36]  M. Baker Proteomics: The interaction map , 2012, Nature.

[37]  Brian W. Junker,et al.  Essential independence and likelihood-based ability estimation for polytomous items , 1991 .

[38]  G. Masters,et al.  Rating scale analysis , 1982 .

[39]  Raymond J. Adams,et al.  PISA 2000 technical report , 2002 .

[40]  Andreas Schleicher Measuring Student Knowledge and Skills: A New Framework for Assessment. , 1999 .

[41]  Wendy M. Yen,et al.  Effects of Local Item Dependence on the Fit and Equating Performance of the Three-Parameter Logistic Model , 1984 .

[42]  J. Rost,et al.  Applications of Latent Trait and Latent Class Models in the Social Sciences , 1998 .

[43]  Jean Yan Examining local item dependence effects in a large scale science assessment by a Rasch partial credit model , 1997 .

[44]  Eric T. Bradlow,et al.  A Bayesian random effects model for testlets , 1999 .

[45]  Warwick B. Elley,et al.  The IEA study of reading literacy : achievement and instruction in thirty-two school systems , 1994 .

[46]  Stephen G. Sireci,et al.  ON THE RELIABILITY OF TESTLET‐BASED TESTS , 1991 .

[47]  Raymond J. Adams,et al.  Rasch models for item bundles , 1995 .

[48]  Paul De Boeck,et al.  A parametric model for local dependence among test items. , 1997 .