A Bootstrapping Assessment on A U.S. Education Indicator Construction Through Multiple Imputation

Under a matrix sampling design, no students complete all test booklets in the National Assessment of Educational Progress (NAEP). To construct an education indicator on what students know and can do, multiple imputation (MI) is conducted to compute plausible values (PV) from student responses to a subset of the questions. Since 2013, NAEP increased the number of imputed PV from five to 20. A purpose of this investigation is to examine the impact of this NAEP change on indicator reporting. R algorithm is created to compute bootstrap standard errors of the PV distribution. The results show that the 20-imputation setting has reduced the standard error and improved normality in comparison to the five-imputation setting. While the bootstrap technique is typically set to generate 1000 resamples, the findings from this study further indicate that an increase of the resampling number is unlikely to reduce the standard error estimate.

[1]  Rand R. Wilcox,et al.  Fundamentals of Modern Statistical Methods , 2001 .

[2]  J. Graham,et al.  How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory , 2007, Prevention Science.

[3]  E. Guerrero,et al.  Cultural competence in outpatient substance abuse treatment: measurement and relationship to wait time and retention. , 2011, Drug and alcohol dependence.

[4]  Dallas E. Johnson,et al.  An Examination of Discrepancies in Multiple Imputation Procedures Between SAS® and SPSS® , 2018, The American Statistician.

[5]  Lori A. Post,et al.  Strategies for Dealing with Missing Data in Clinical Trials: From Design to Analysis , 2013, The Yale journal of biology and medicine.

[6]  Matthias von Davier,et al.  The DINA model as a constrained general diagnostic model: Two variants of a model equivalency. , 2014 .

[7]  Eugene G. Johnson,et al.  Scaling Procedures in NAEP , 1992 .

[8]  David Kaplan,et al.  Optimizing Prediction Using Bayesian Model Averaging: Examples Using Large-Scale Educational Assessments , 2018, Evaluation review.

[9]  Craig K. Enders,et al.  Missing Data in Educational Research: A Review of Reporting Practices and Suggestions for Improvement , 2004 .

[10]  Martin Hecht,et al.  Nested multiple imputation in large-scale assessments , 2014, Large-scale Assessments in Education.

[11]  Robert J. Mislevy,et al.  Randomization-based inference about latent variables from complex samples , 1991 .

[12]  J. Graham,et al.  Missing data analysis: making it work in the real world. , 2009, Annual review of psychology.

[13]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[14]  StataCorp Stata multiple-imputation reference manual , 2011 .

[15]  J. Twisk,et al.  Why item response theory should be used for longitudinal questionnaire data analysis in medical research , 2015, BMC Medical Research Methodology.

[16]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[17]  C. Adams,et al.  Collective Trust: A Social Indicator of Instructional Capacity. , 2013 .

[18]  W Scott Comulada,et al.  Model Specification and Bootstrapping for Multiply Imputed Data: An Application to Count Models for the Frequency of Alcohol Use , 2015, The Stata journal.

[19]  Johnston Df,et al.  Social measurement and social indicators , 1981 .

[20]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[21]  Todd E. Bodner,et al.  What Improves with Increased Missing Data Imputations? , 2008 .

[22]  Hanneke Geerlings,et al.  Analysis of longitudinal randomized clinical trials using item response models. , 2009, Contemporary clinical trials.