The Sensitivity of Value-Added Modeling to the Creation of a Vertical Score Scale

The purpose of this study was to evaluate the sensitivity of growth and value-added modeling to the way an underlying vertical score scale has been created. Longitudinal item-level data were analyzed with both student- and school-level identifiers for the entire state of Colorado between 2003 and 2006. Eight different vertical scales were established on the basis of choices made for three key variables: the item response theory modeling approach, the calibration approach, and the student proficiency estimation approach. Each scale represented a methodological approach that was psychometrically defensible. Longitudinal values from each scale were used as the outcome in a commonly used value-added model (the layered model popularized by William Sanders) as a means of estimating school effects. Our findings suggest that while the ordering of estimating school effects is insensitive to the underlying vertical scale, the precision of such value-added estimates can be quite sensitive to the combinations of choices made in the creation of the scale.

[1]  R. Lissitz,et al.  An Evaluation of the Accuracy of Multidimensional IRT Linking , 2000 .

[2]  N. Petersen,et al.  A Test of the Adequacy of Curvilinear Score Equating Models , 1983 .

[3]  Douglas N. Harris,et al.  Would Accountability Based on Teacher Value Added Be Smart Policy? An Examination of the Statistical Properties and Policy Alternatives , 2009, Education Finance and Policy.

[4]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[5]  Daniel F. McCaffrey,et al.  Evaluating Value-Added Models for Teacher Accountability. Monograph. , 2003 .

[6]  Anton Beguin,et al.  Effect of Multidimensionality on Separate and Concurrent estimation in IRT Equating. , 2000 .

[7]  Dale Ballou,et al.  Test Scaling and Value-Added Measurement , 2009, Education Finance and Policy.

[8]  M. R. Novick,et al.  Statistical Theories of Mental Test Scores. , 1971 .

[9]  R. Mislevy Linking Educational Assessments: Concepts, Issues, Methods, and Prospects. , 1992 .

[10]  K. Frank,et al.  The Metric Matters: The Sensitivity of Conclusions About Growth in Student Achievement to Choice of Metric , 1994 .

[11]  Daniel F. McCaffrey,et al.  Bayesian Methods for Scalable Multivariate Value-Added Assessment , 2007 .

[12]  R. Darrell Bock,et al.  Multiple Group IRT , 1997 .

[13]  D. Harris The Policy Uses and “Policy Validity” of Value-Added and Other Teacher Quality Measures , 2008 .

[14]  A. Standen Causes and Effects. , 1957, Science.

[15]  S. Raudenbush Schooling, Statistics, and Poverty: Can We Measure School Improvement?. , 2004 .

[16]  R. Linn,et al.  An Exploration of the Adequacy of the Rasch Model For the Problem of Vertical Equating. , 1978 .

[17]  F. Lord Applications of Item Response Theory To Practical Testing Problems , 1980 .

[18]  Paul Wright,et al.  Controlling for Student Background in Value-Added Assessment of Teachers , 2004 .

[19]  T. C. Oshima,et al.  Multidimensional Linking: Four Practical Approaches , 2000 .

[20]  Scale Shrinkage in Vertical Equating , 1993 .

[21]  Mark R. Wilson,et al.  On choosing a model for measuring , 2003 .

[22]  Carol L Cone,et al.  Causes and effects. , 2003, Harvard business review.

[23]  Anthony S. Bryk,et al.  Hierarchical Linear Models: Applications and Data Analysis Methods , 1992 .

[24]  R. D. Bock,et al.  Adaptive EAP Estimation of Ability in a Microcomputer Environment , 1982 .

[25]  SAMPLING OF COMMON ITEMS: AN UNRECOGNIZED SOURCE OF ERROR IN TEST EQUATING1 , 2004 .

[26]  F. Modigliani,et al.  DIVIDEND POLICY, GROWTH, AND THE VALUATION OF SHARES , 1961 .

[27]  G. Masters A rasch model for partial credit scoring , 1982 .

[28]  E. Muraki A Generalized Partial Credit Model: Application of an EM Algorithm , 1992 .

[29]  R. L. Lim Linking Results of Distinct Assessments , 1993 .

[30]  Edward H. Haertel,et al.  Sampling of Common Items: An Unrecognized Source of Error in Test Equating. CSE Report 636. , 2004 .

[31]  Terry A. Ackerman A Didactic Explanation of Item Bias, Item Impact, and Item Validity from a Multidimensional Perspective , 1992 .

[32]  Donald B. Rubin,et al.  A Potential Outcomes View of Value-Added Assessment in Education , 2004 .

[33]  M. J. Kolen COMPARISON OF TRADITIONAL AND ITEM RESPONSE THEORY METHODS FOR EQUATING TESTS , 1981 .

[34]  Dorothy T. Thayer,et al.  The Kernel Method of Test Equating , 2003 .

[35]  Education Professional Standards Board Evaluating Value-Added models for Teacher Accountability , 2004 .

[36]  David Thissen,et al.  Item Response Theory for Items Scored in Two Categories , 2001 .

[37]  Henry Braun Using Student Progress to Evaluate Teachers: A Primer on Value-Added Models. Policy Information Perspective. , 2005 .

[38]  Robert W. Lissitz,et al.  Vertical Equating for State Assessments: Issues and Solutions in Determination of Adequate Yearly Progress and School Accountability. , 2003 .

[39]  R. D. Bock,et al.  Marginal maximum likelihood estimation of item parameters , 1982 .

[40]  G. Camilli,et al.  Scale Shrinkage and the Estimation of Latent Distribution Parameters , 1988 .

[41]  Mark Wilson,et al.  Constructing Measures: An Item Response Modeling Approach , 2004 .

[42]  Benjamin D. Wright,et al.  A History of Social Science Measurement , 2005 .

[43]  F. Baker,et al.  Item response theory : parameter estimation techniques , 1993 .

[44]  R. D. Bock,et al.  Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm , 1981 .

[45]  Anton A. Béguin,et al.  Obtaining a Common Scale for Item Response Theory Item Parameters Using Separate Versus Concurrent Estimation in the Common-Item Equating Design , 2002 .

[46]  V. S. Williams,et al.  A Comparison of Developmental Scales Based on Thurstone Methods and Item Response Theory , 1998 .

[47]  Linda L. Cook,et al.  Irt Versus Conventional Equating Methods: A Comparative Study of Scale Stability , 1983 .

[48]  Thomas A Louis,et al.  Jump down to Document , 2022 .

[49]  Robert L. Linn,et al.  The Rasch Model, Objective Measurement, Equating, and Robustness , 1979 .

[50]  J. Singer,et al.  Applied Longitudinal Data Analysis , 2003 .

[51]  Susan E. Holmes UNIDIMENSIONALITY AND VERTICAL EQUATING WITH THE RASCH MODEL , 1982 .

[52]  Frank B. Baker,et al.  Item Response Theory : Parameter Estimation Techniques, Second Edition , 2004 .

[53]  Daniel M. Lewis,et al.  Aligning Policy and Methodology to Achieve Consistent Across-Grade Performance Standards , 2005 .

[54]  Joseph A. Martineau Distorting Value Added: The Use of Longitudinal, Vertically Scaled Student Achievement Data for Growth-Based, Value-Added Accountability , 2006 .

[55]  D. R. Divgi Model-Free Evaluation of Equating and Scaling , 1981 .

[56]  W. M. Yen Increasing item complexity: A possible cause of scale shrinkage for unidimensional item response theory , 1985 .

[57]  A. A. Davier,et al.  Test equating, scaling, and linking. Methods and practices , 2006 .

[58]  E. Muraki A Generalized Partial Credit Model , 1997 .

[59]  R. C. Sykes,et al.  Concurrent and Separate Grade-Groups Linking Procedures for Vertical Scaling , 2008 .

[60]  Wendy M. Yen,et al.  THE CHOICE OF SCALE FOR EDUCATIONAL MEASUREMENT: AN IRT PERSPECTIVE , 1986 .

[61]  Derek C. Briggs,et al.  The Impact of Vertical Scaling Decisions on Growth Interpretations. , 2009 .

[62]  R. Brennan,et al.  Test Equating, Scaling, and Linking , 2004 .

[63]  J. Gustafsson The Rasch Model in Vertical Equating of Tests: A Critique of Slinde and Linn. , 1979 .

[64]  Seock-Ho Kim,et al.  A Comparison of Linking and Concurrent Calibration Under Item Response Theory , 1996 .

[65]  R. Maruyama,et al.  On Test Scoring , 1927 .

[66]  Brenda H. Loyd,et al.  VERTICAL EQUATING USING THE RASCH MODEL , 1980 .

[67]  Effect of Noncompensatory Multidimensionality on Separate and Concurrent estimation in IRT Observed Score Equating , 2001 .

[68]  Melvin R. Novick,et al.  Some latent train models and their use in inferring an examinee's ability , 1966 .

[69]  R. Brennan,et al.  Test Equating, Scaling, and Linking: Methods and Practices , 2004 .

[70]  Robert W. Lissitz,et al.  IRT Test Equating: Relevant Issues and a Review of Recent Research , 1986 .

[71]  Daniel F. McCaffrey,et al.  The Sensitivity of Value‐Added Teacher Effect Estimates to Different Mathematics Achievement Measures , 2007 .

[72]  David S. Behavioral,et al.  History as Social Science , 1971 .

[73]  Martha L. Stocking,et al.  Developing a Common Metric in Item Response Theory , 1982 .

[74]  Thakur B. Karkee,et al.  Separate versus Concurrent Calibration Methods in Vertical Scaling. , 2003 .

[75]  Michael J. Kolen,et al.  Comparisons of Methodologies and Results in Vertical Scaling for Educational Achievement Tests , 2007 .

[76]  R. Linn,et al.  A NOTE ON VERTICAL EQUATING VIA THE RASCH MODEL FOR GROUPS OF QUITE DIFFERENT ABILITY AND TESTS OF QUITE DIFFERENT DIFFICULTY , 1979 .