A BAYESIAN HIERARCHICAL MODEL FOR LARGE-SCALE EDUCATIONAL SURVEYS: AN APPLICATION TO THE NATIONAL ASSESSMENT OF EDUCATIONAL PROGRESS

Large-scale educational assessments such as the National Assessment of Educational Progress (NAEP) sample examinees to whom an exam will be administered. In most situations the sampling design is not a simple random sample and must be accounted for in the estimating model. After reviewing the current operational estimation procedure for NAEP, this paper describes a Bayesian hierarchical model for the analysis of complex large-scale assessments. The model clusters students within schools and schools within primary sampling units. The paper discusses an estimation procedure that utilizes a Markov chain Monte Carlo algorithm to approximate the posterior distribution of the model parameters. Results from two Bayesian models, one treating item parameters as known and one treating them as unknown, are compared to results from the current operational method on a simulated data set and on a subset of data from the 1998 NAEP reading assessment. The point estimates from the Bayesian model and the operational method are quite similar in most cases, but there does seem to be systematic differences in measures of uncertainty (e.g., standard errors, confidence intervals).

[1]  Roger A. Sugden,et al.  Multiple Imputation for Nonresponse in Surveys , 1988 .

[2]  F. Lord Applications of Item Response Theory To Practical Testing Problems , 1980 .

[3]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[4]  Seock-Ho Kim BILOG 3 for Windows: Item Analysis and Test Scoring with Binary Logistic Models , 1997 .

[5]  J. Ware,et al.  Random-effects models for longitudinal data. , 1982, Biometrics.

[6]  Neal Thomas The E-Step of the MGROUP EM Algorithm. Program Statistics Research Technical Report No. 93-37. , 1993 .

[7]  Brian W. Junker,et al.  Applications and Extensions of MCMC in IRT: Multiple Item Types, Missing Data, and Rated Responses , 1999 .

[8]  Robert J. Mislevy,et al.  BILOG 3 : item analysis and test scoring with binary logistic models , 1990 .

[9]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Melvin R. Novick,et al.  Some latent train models and their use in inferring an examinee's ability , 1966 .

[11]  Nancy L. Allen,et al.  The NAEP 1998 Technical Report. , 2001 .

[12]  Stephen W. Raudenbush,et al.  Synthesizing Results from the Trial State Assessment , 1999 .

[13]  E. B. Andersen,et al.  Asymptotic Properties of Conditional Maximum‐Likelihood Estimators , 1970 .

[14]  Robert J. Mislevy,et al.  Randomization-based inference about latent variables from complex samples , 1991 .

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  R. Hambleton,et al.  Handbook of Modern Item Response Theory , 1997 .

[17]  R. Zwick Special Issue on the National Assessment of Educational Progress , 1992 .

[18]  Edward H. Ip,et al.  Empirical Bayes and Item-Clustering Effects in a Latent Variable Hierarchical Model , 2002 .

[19]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[20]  T. Postelnicu,et al.  Foundations of inference in survey sampling , 1977 .

[21]  Kenneth G. Manton,et al.  “Equivalent Sample Size” and “Equivalent Degrees of Freedom” Refinements for Inference Using Survey Weights under Superpopulation Models , 1992 .

[22]  E. Muraki A GENERALIZED PARTIAL CREDIT MODEL: APPLICATION OF AN EM ALGORITHM , 1992 .

[23]  N. T. Longford Model-Based Methods for Analysis of Data from 1990 NAEP Trial State Assessment. Research and Development Report. , 1995 .