Using the Stan Program for Bayesian Item Response Theory

Stan is a new Bayesian statistical software program that implements the powerful and efficient Hamiltonian Monte Carlo (HMC) algorithm. To date there is not a source that systematically provides Stan code for various item response theory (IRT) models. This article provides Stan code for three representative IRT models, including the three-parameter logistic IRT model, the graded response model, and the nominal response model. We demonstrate how IRT model comparison can be conducted with Stan and how the provided Stan code for simple IRT models can be easily extended to their multidimensional and multilevel cases.

[1]  D. Andrich A rating formulation for ordered response categories , 1978 .

[2]  S. McKay Curtis,et al.  BUGS Code for Item Response Theory , 2010 .

[3]  Allison J. Ames,et al.  Using SAS PROC MCMC for Item Response Theory Models , 2015, Educational and psychological measurement.

[4]  Sumio Watanabe,et al.  Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory , 2010, J. Mach. Learn. Res..

[5]  Yong Luo,et al.  Performances of LOO and WAIC as IRT Model Selection Methods , 2017 .

[6]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[7]  Allan S. Cohen,et al.  A Mixture Item Response Model for Multiple-Choice Data , 2001 .

[8]  Jeffrey S. Rosenthal,et al.  Optimal Proposal Distributions and Adaptive MCMC , 2011 .

[9]  Thomas Rusch,et al.  Linear Logistic Models with Relaxed Assumptions in R , 2013, Algorithms from and for Nature and Life.

[10]  Andrew Thomas,et al.  WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility , 2000, Stat. Comput..

[11]  J. Fox,et al.  Bayesian estimation of a multilevel IRT model using gibbs sampling , 2001 .

[12]  Andrew D. Martin,et al.  MCMCpack: Markov chain Monte Carlo in R , 2011 .

[13]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[14]  Andrew Gelman,et al.  R2WinBUGS: A Package for Running WinBUGS from R , 2005 .

[15]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[16]  David J. Lunn,et al.  The BUGS Book: A Practical Introduction to Bayesian Analysis , 2013 .

[17]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[18]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  David Thissen,et al.  A taxonomy of item response models , 1986 .

[20]  Sandip Sinharay,et al.  Experiences With Markov Chain Monte Carlo Convergence Assessment in Two Psychometric Examples , 2004 .

[21]  Bradley P. Carlin,et al.  Markov Chain Monte Carlo conver-gence diagnostics: a comparative review , 1996 .

[22]  Martyn Plummer,et al.  JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling , 2003 .

[23]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[24]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[25]  R. Darrell Bock,et al.  Estimating item parameters and latent ability when responses are scored in two or more nominal categories , 1972 .

[26]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[27]  Aki Vehtari,et al.  Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC , 2015, Statistics and Computing.

[28]  Wen-Chung Wang,et al.  The Rasch Testlet Model , 2005 .

[29]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[30]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[31]  G. Masters A rasch model for partial credit scoring , 1982 .

[32]  H. Akaike A new look at the statistical model identification , 1974 .

[33]  F. Samejima Estimation of latent ability using a response pattern of graded scores , 1969 .