Scaling Procedures in NAEP

Scale-score reporting is a recent innovation in the National Assessment of Educational Progress (NAEP). With scaling methods, the performance of a sample of students in a subject area or subarea can be summarized on a single scale even when different students have been administered different exercises. This article presents an overview of the scaling methodologies employed in the analyses of NAEP surveys beginning with 1984. The first section discusses the perspective on scaling from which the procedures were conceived and applied. The plausible values methodology developed for use in NAEP scale-score analyses is then described, in the contexts of item response theory and average response method scaling. The concluding section lists milestones in the evolution of the plausible values approach in NAEP and directions for further improvement.

[1]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[2]  Arthur N. Applebee The Writing Report Card: Writing Achievement in American Schools. , 1986 .

[3]  F. Samejima Estimation of latent ability using a response pattern of graded scores , 1968 .

[4]  Irwin S. Kirsch,et al.  Literacy, profiles of America's young adults , 1986 .

[5]  Rebecca Zwick,et al.  The Effect of Changes in the National Assessment: Disentangling the NAEP 1985-86 Reading Anomaly. Revised. , 1990 .

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  Eiji Muraki,et al.  Fitting a Polytomous Item Response Model to Likert-Type Data , 1990 .

[8]  A. Beaton,et al.  The Average Response Method of Scaling , 1990 .

[9]  Albert E. Beaton,et al.  Expanding the New Design: The NAEP 1985-86 Technical Report. , 1988 .

[10]  R. Mislevy Estimating latent distributions , 1984 .

[11]  R. Hambleton Principles and selected applications of item response theory. , 1989 .

[12]  R. D. Bock,et al.  The Next Stage in Educational Assessment , 1982 .

[13]  Eugene G. Johnson,et al.  Focusing the New Design: The NAEP 1988 Technical Report. , 1990 .

[14]  Robert J. Mislevy,et al.  Randomization-based inference about latent variables from complex samples , 1991 .

[15]  R. Darrell Bock,et al.  Multilevel analysis of educational data , 1989 .

[16]  Robert J. Mislevy,et al.  Estimation of Latent Group Effects , 1985 .

[17]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[18]  E. B. Andersen,et al.  Estimating the parameters of the latent population distribution , 1977 .

[19]  R. Darrell Bock,et al.  Estimating item parameters and latent ability when responses are scored in two or more nominal categories , 1972 .

[20]  Albert E. Beaton Implementing the New Design: The NAEP 1983-84 Technical Report. , 1987 .

[21]  D. Rubin,et al.  On Jointly Estimating Parameters and Missing Data by Maximizing the Complete-Data Likelihood , 1983 .

[22]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[23]  R. Mislevy,et al.  SOME CONSEQUENCES OF THE UNCERTAINTY IN IRT LINKING PROCEDURES , 1988 .

[24]  L. Cronbach,et al.  How we should measure "change": Or should we? , 1970 .

[25]  F. Lord Applications of Item Response Theory To Practical Testing Problems , 1980 .

[26]  Robert J. Mislevy,et al.  BILOG 3 : item analysis and test scoring with binary logistic models , 1990 .

[27]  K. Rust,et al.  Population Inferences and Variance Estimation for NAEP Data , 1992 .

[28]  Ledyard R Tucker,et al.  Relations of factor score estimates to their use , 1970 .