论文信息 - The Sensitivity of Value-Added Modeling to the Creation of a Vertical Score Scale - 字舞流文

The Sensitivity of Value-Added Modeling to the Creation of a Vertical Score Scale

The purpose of this study was to evaluate the sensitivity of growth and value-added modeling to the way an underlying vertical score scale has been created. Longitudinal item-level data were analyzed with both student- and school-level identifiers for the entire state of Colorado between 2003 and 2006. Eight different vertical scales were established on the basis of choices made for three key variables: the item response theory modeling approach, the calibration approach, and the student proficiency estimation approach. Each scale represented a methodological approach that was psychometrically defensible. Longitudinal values from each scale were used as the outcome in a commonly used value-added model (the layered model popularized by William Sanders) as a means of estimating school effects. Our findings suggest that while the ordering of estimating school effects is insensitive to the underlying vertical scale, the precision of such value-added estimates can be quite sensitive to the combinations of choices made in the creation of the scale.

Derek C. Briggs | Jonathan P. Weeks

[1] R. Lissitz,et al. An Evaluation of the Accuracy of Multidimensional IRT Linking , 2000 .

[2] N. Petersen,et al. A Test of the Adequacy of Curvilinear Score Equating Models , 1983 .

[3] Douglas N. Harris,et al. Would Accountability Based on Teacher Value Added Be Smart Policy? An Examination of the Statistical Properties and Policy Alternatives , 2009, Education Finance and Policy.

[4] Georg Rasch,et al. Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[5] Daniel F. McCaffrey,et al. Evaluating Value-Added Models for Teacher Accountability. Monograph. , 2003 .

[6] Anton Beguin,et al. Effect of Multidimensionality on Separate and Concurrent estimation in IRT Equating. , 2000 .

[7] Dale Ballou,et al. Test Scaling and Value-Added Measurement , 2009, Education Finance and Policy.

[8] M. R. Novick,et al. Statistical Theories of Mental Test Scores. , 1971 .

[9] R. Mislevy. Linking Educational Assessments: Concepts, Issues, Methods, and Prospects. , 1992 .

[10] K. Frank,et al. The Metric Matters: The Sensitivity of Conclusions About Growth in Student Achievement to Choice of Metric , 1994 .

[11] Daniel F. McCaffrey,et al. Bayesian Methods for Scalable Multivariate Value-Added Assessment , 2007 .

[12] R. Darrell Bock,et al. Multiple Group IRT , 1997 .

[13] D. Harris. The Policy Uses and “Policy Validity” of Value-Added and Other Teacher Quality Measures , 2008 .

[14] A. Standen. Causes and Effects. , 1957, Science.

[15] S. Raudenbush. Schooling, Statistics, and Poverty: Can We Measure School Improvement?. , 2004 .

[16] R. Linn,et al. An Exploration of the Adequacy of the Rasch Model For the Problem of Vertical Equating. , 1978 .

[17] F. Lord. Applications of Item Response Theory To Practical Testing Problems , 1980 .

[18] Paul Wright,et al. Controlling for Student Background in Value-Added Assessment of Teachers , 2004 .

[19] T. C. Oshima,et al. Multidimensional Linking: Four Practical Approaches , 2000 .

[20] Scale Shrinkage in Vertical Equating , 1993 .

[21] Mark R. Wilson,et al. On choosing a model for measuring , 2003 .

[22] Carol L Cone,et al. Causes and effects. , 2003, Harvard business review.

[23] Anthony S. Bryk,et al. Hierarchical Linear Models: Applications and Data Analysis Methods , 1992 .

[24] R. D. Bock,et al. Adaptive EAP Estimation of Ability in a Microcomputer Environment , 1982 .

[25] SAMPLING OF COMMON ITEMS: AN UNRECOGNIZED SOURCE OF ERROR IN TEST EQUATING1 , 2004 .

[26] F. Modigliani,et al. DIVIDEND POLICY, GROWTH, AND THE VALUATION OF SHARES , 1961 .

[27] G. Masters. A rasch model for partial credit scoring , 1982 .

[28] E. Muraki. A Generalized Partial Credit Model: Application of an EM Algorithm , 1992 .

[29] R. L. Lim. Linking Results of Distinct Assessments , 1993 .

[30] Edward H. Haertel,et al. Sampling of Common Items: An Unrecognized Source of Error in Test Equating. CSE Report 636. , 2004 .

[31] Terry A. Ackerman. A Didactic Explanation of Item Bias, Item Impact, and Item Validity from a Multidimensional Perspective , 1992 .

[32] Donald B. Rubin,et al. A Potential Outcomes View of Value-Added Assessment in Education , 2004 .

[33] M. J. Kolen. COMPARISON OF TRADITIONAL AND ITEM RESPONSE THEORY METHODS FOR EQUATING TESTS , 1981 .

[34] Dorothy T. Thayer,et al. The Kernel Method of Test Equating , 2003 .

[35] Education Professional Standards Board. Evaluating Value-Added models for Teacher Accountability , 2004 .

[36] David Thissen,et al. Item Response Theory for Items Scored in Two Categories , 2001 .

[37] Henry Braun. Using Student Progress to Evaluate Teachers: A Primer on Value-Added Models. Policy Information Perspective. , 2005 .

[38] Robert W. Lissitz,et al. Vertical Equating for State Assessments: Issues and Solutions in Determination of Adequate Yearly Progress and School Accountability. , 2003 .

[39] R. D. Bock,et al. Marginal maximum likelihood estimation of item parameters , 1982 .

[40] G. Camilli,et al. Scale Shrinkage and the Estimation of Latent Distribution Parameters , 1988 .

[41] Mark Wilson,et al. Constructing Measures: An Item Response Modeling Approach , 2004 .

[42] Benjamin D. Wright,et al. A History of Social Science Measurement , 2005 .

[43] F. Baker,et al. Item response theory : parameter estimation techniques , 1993 .

[44] R. D. Bock,et al. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm , 1981 .

[45] Anton A. Béguin,et al. Obtaining a Common Scale for Item Response Theory Item Parameters Using Separate Versus Concurrent Estimation in the Common-Item Equating Design , 2002 .

[46] V. S. Williams,et al. A Comparison of Developmental Scales Based on Thurstone Methods and Item Response Theory , 1998 .

[47] Linda L. Cook,et al. Irt Versus Conventional Equating Methods: A Comparative Study of Scale Stability , 1983 .

[48] Thomas A Louis,et al. Jump down to Document , 2022 .

[49] Robert L. Linn,et al. The Rasch Model, Objective Measurement, Equating, and Robustness , 1979 .

[50] J. Singer,et al. Applied Longitudinal Data Analysis , 2003 .

[51] Susan E. Holmes. UNIDIMENSIONALITY AND VERTICAL EQUATING WITH THE RASCH MODEL , 1982 .

[52] Frank B. Baker,et al. Item Response Theory : Parameter Estimation Techniques, Second Edition , 2004 .

[53] Daniel M. Lewis,et al. Aligning Policy and Methodology to Achieve Consistent Across-Grade Performance Standards , 2005 .

[54] Joseph A. Martineau. Distorting Value Added: The Use of Longitudinal, Vertically Scaled Student Achievement Data for Growth-Based, Value-Added Accountability , 2006 .

[55] D. R. Divgi. Model-Free Evaluation of Equating and Scaling , 1981 .

[56] W. M. Yen. Increasing item complexity: A possible cause of scale shrinkage for unidimensional item response theory , 1985 .

[57] A. A. Davier,et al. Test equating, scaling, and linking. Methods and practices , 2006 .

[58] E. Muraki. A Generalized Partial Credit Model , 1997 .

[59] R. C. Sykes,et al. Concurrent and Separate Grade-Groups Linking Procedures for Vertical Scaling , 2008 .

[60] Wendy M. Yen,et al. THE CHOICE OF SCALE FOR EDUCATIONAL MEASUREMENT: AN IRT PERSPECTIVE , 1986 .

[61] Derek C. Briggs,et al. The Impact of Vertical Scaling Decisions on Growth Interpretations. , 2009 .

[62] R. Brennan,et al. Test Equating, Scaling, and Linking , 2004 .

[63] J. Gustafsson. The Rasch Model in Vertical Equating of Tests: A Critique of Slinde and Linn. , 1979 .

[64] Seock-Ho Kim,et al. A Comparison of Linking and Concurrent Calibration Under Item Response Theory , 1996 .

[65] R. Maruyama,et al. On Test Scoring , 1927 .

[66] Brenda H. Loyd,et al. VERTICAL EQUATING USING THE RASCH MODEL , 1980 .

[67] Effect of Noncompensatory Multidimensionality on Separate and Concurrent estimation in IRT Observed Score Equating , 2001 .

[68] Melvin R. Novick,et al. Some latent train models and their use in inferring an examinee's ability , 1966 .

[69] R. Brennan,et al. Test Equating, Scaling, and Linking: Methods and Practices , 2004 .

[70] Robert W. Lissitz,et al. IRT Test Equating: Relevant Issues and a Review of Recent Research , 1986 .

[71] Daniel F. McCaffrey,et al. The Sensitivity of Value‐Added Teacher Effect Estimates to Different Mathematics Achievement Measures , 2007 .

[72] David S. Behavioral,et al. History as Social Science , 1971 .

[73] Martha L. Stocking,et al. Developing a Common Metric in Item Response Theory , 1982 .

[74] Thakur B. Karkee,et al. Separate versus Concurrent Calibration Methods in Vertical Scaling. , 2003 .

[75] Michael J. Kolen,et al. Comparisons of Methodologies and Results in Vertical Scaling for Educational Achievement Tests , 2007 .

[76] R. Linn,et al. A NOTE ON VERTICAL EQUATING VIA THE RASCH MODEL FOR GROUPS OF QUITE DIFFERENT ABILITY AND TESTS OF QUITE DIFFERENT DIFFICULTY , 1979 .