Exploring the Full-Information Bifactor Model in Vertical Scaling With Construct Shift

To address the lack of attention to construct shift in item response theory (IRT) vertical scaling, a multigroup, bifactor model was proposed to model the common dimension for all grades and the grade-specific dimensions. Bifactor model estimation accuracy was evaluated through a simulation study with manipulated factors of percentage of common items, sample size, and degree of construct shift. In addition, the unidimensional IRT (UIRT) model, which ignores construct shift, was also estimated to represent current practice. It was found that (a) bifactor models were well recovered overall, though the grade-specific dimensions were not as well recovered as the general dimension; (b) item discrimination parameter estimates were overestimated in UIRT models due to the effect of construct shift; (c) the person parameters of UIRT models were less accurately estimated than those of bifactor models; (d) group mean parameter estimates from UIRT models were less accurate than those of bifactor models; and (e) a large effect due to construct shift was found for the group mean parameter estimates of UIRT models. A real data analysis provided an illustration of how bifactor models can be applied to problems involving vertical scaling with construct shift. General procedures for testing practice were recommended and discussed.

[1]  Daniel M. Bolt,et al.  A Comparison of Alternative Models for Testlets , 2006 .

[2]  L. Thelma Meeting of the National Council on Measurement in Education , 2000 .

[3]  Taehoon Kang,et al.  Linking Item Parameters to a Base Scale. ACT Research Report Series, 2009-2. , 2009 .

[4]  Donald Hedeker,et al.  Full-Information Item Bifactor Analysis of Graded Response Data , 2007 .

[5]  Effect of Examinee Ability on Test Equating Invariance , 1988 .

[6]  Lihua Yao,et al.  Methods and Models for Vertical Scaling , 2007 .

[7]  S. Reise,et al.  Parameter Recovery in the Graded Response Model Using MULTILOG , 1990 .

[8]  Frank Rijmen,et al.  Formal Relations and an Empirical Comparison among the Bi‐Factor, the Testlet, and a Second‐Order Multidimensional IRT Model , 2010 .

[9]  D. Thissen,et al.  Likelihood-Based Item-Fit Indices for Dichotomous Item Response Theory Models , 2000 .

[10]  Clement A. Stone,et al.  Evaluating Item Fit for Multidimensional Item Response Models , 2008 .

[11]  Gary L. Williamson,et al.  Longitudinal Analyses of Academic Achievement , 1991 .

[12]  Linda L. Cook,et al.  Irt Versus Conventional Equating Methods: A Comparative Study of Scale Stability , 1983 .

[13]  Linda L. Cook,et al.  SPECIFYING THE CHARACTERISTICS OF LINKING ITEMS USED FOR ITEM RESPONSE THEORY ITEM CALIBRATION1,2 , 1987 .

[14]  K. Holzinger,et al.  The Bi-factor method , 1937 .

[15]  F. Samejima Estimation of latent ability using a response pattern of graded scores , 1968 .

[16]  Seock-Ho Kim,et al.  A Comparison of Linking and Concurrent Calibration Under the Graded Response Model , 1997 .

[17]  Donald Hedeker,et al.  Full-information item bi-factor analysis , 1992 .

[18]  Martha L. Stocking,et al.  Developing a Common Metric in Item Response Theory , 1983 .

[19]  Li Cai,et al.  Generalized full-information item bifactor analysis. , 2011, Psychological methods.

[20]  Mark D. Reckase,et al.  The Difficulty of Test Items That Measure More Than One Ability , 1985 .

[21]  Michael J. Kolen,et al.  Comparisons of Methodologies and Results in Vertical Scaling for Educational Achievement Tests , 2007 .

[22]  R. Brennan,et al.  Test Equating, Scaling, and Linking: Methods and Practices , 2004 .

[23]  P. Holland,et al.  Linking and aligning scores and scales , 2007 .

[24]  E. Muraki A Generalized Partial Credit Model: Application of an EM Algorithm , 1992 .

[25]  Seock-Ho Kim,et al.  A Comparison of Linking and Concurrent Calibration Under Item Response Theory , 1996 .

[26]  Anton A. Béguin,et al.  Obtaining a Common Scale for Item Response Theory Item Parameters Using Separate Versus Concurrent Estimation in the Common-Item Equating Design , 2002 .

[27]  R. Brennan,et al.  Test Equating, Scaling, and Linking , 2004 .

[28]  A. A. Davier,et al.  Test equating, scaling, and linking. Methods and practices , 2006 .

[29]  Minjeong Jeon,et al.  Modeling Differential Item Functioning Using a Generalization of the Multiple-Group Bifactor Model , 2013 .

[30]  James Algina,et al.  An Empirical Comparison of Statistical Models for Value-Added Assessment of School Performance , 2004 .

[31]  A. Rupp,et al.  Performance of the S − χ2 Statistic for Full-Information Bifactor Models , 2011 .

[32]  M. K. Simon COMPARISON OF CONCURRENT AND SEPARATE MULTIDIMENSIONAL IRT LINKING OF ITEM PARAMETERS , 2008 .

[33]  David Thissen,et al.  On the relationship between the higher-order factor model and the hierarchical factor model , 1999 .

[34]  Huijuan Meng,et al.  A comparison study of IRT calibration methods for mixed-format tests in vertical scaling , 2007 .

[35]  Taehoon Kang,et al.  Linking item parameters to a base scale , 2012 .

[36]  G. Eklund,et al.  Problems and methods in longitudinal research: Studying individual development: problems and methods , 1991 .

[37]  F. Samejima A General Model for Free Response Data. , 1972 .

[38]  Benjamin D. Wright,et al.  SAMPLE-FREE TEST CALIBRATION AND PERSON MEASUREMENT. PAPER PRESENTED AT THE NATIONAL SEMINAR ON ADULT EDUCATION RESEARCH (CHICAGO, FEBRUARY 11-13, 1968). , 1967 .

[39]  F. Lord Applications of Item Response Theory To Practical Testing Problems , 1980 .

[40]  Christine E. DeMars Application of the Bi-Factor Multidimensional Item Response Theory Model to Testlet-Based Tests. , 2006 .

[41]  Effect of Noncompensatory Multidimensionality on Separate and Concurrent estimation in IRT Observed Score Equating , 2001 .

[42]  Li Cai,et al.  A Two-Tier Full-Information Item Factor Analysis Model with Applications , 2010 .

[43]  Pui‐wa Lei,et al.  Effects of Vertical Scaling Methods on Linear Growth Estimation , 2012 .

[44]  John Schmid,et al.  The development of hierarchical factor solutions , 1957 .

[45]  W. M. Yen Vertical Scaling and No Child Left Behind , 2007 .

[46]  Steven P. Reise,et al.  The role of the bifactor model in resolving dimensionality issues in health outcomes measures , 2007, Quality of Life Research.

[47]  Anton Beguin,et al.  Effect of Multidimensionality on Separate and Concurrent estimation in IRT Equating. , 2000 .

[48]  Hong Jiao,et al.  Construct Equivalence Across Grades in a Vertical Scale for a K-12 Large-Scale Reading Assessment , 2009 .

[49]  W. M. Yen,et al.  Comparison of Item Response Theory and Thurstone Methods of Vertical Scaling , 1997 .