An improved stochastic EM algorithm for large-scale full-information item factor analysis.

In this paper, we explore the use of the stochastic EM algorithm (Celeux & Diebolt (1985) Computational Statistics Quarterly, 2, 73) for large-scale full-information item factor analysis. Innovations have been made on its implementation, including an adaptive-rejection-based Gibbs sampler for the stochastic E step, a proximal gradient descent algorithm for the optimization in the M step, and diagnostic procedures for determining the burn-in size and the stopping of the algorithm. These developments are based on the theoretical results of Nielsen (2000, Bernoulli, 6, 457), as well as advanced sampling and optimization techniques. The proposed algorithm is computationally efficient and virtually tuning-free, making it scalable to large-scale data with many latent traits (e.g. more than five latent traits) and easy to use for practitioners. Standard errors of parameter estimation are also obtained based on the missing-information identity (Louis, 1982, Journal of the Royal Statistical Society, Series B, 44, 226). The performance of the algorithm is evaluated through simulation studies and an application to the analysis of the IPIP-NEO personality inventory. Extensions of the proposed algorithm to other latent variable models are discussed.

[1]  S. Monroe Multidimensional Item Factor Analysis with Semi-Nonparametric Latent Densities , 2014 .

[2]  Thomas Brox,et al.  Maximum Likelihood Estimation , 2019, Time Series Analysis.

[3]  B. Muthén A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators , 1984 .

[4]  Melvin R. Novick,et al.  Some latent train models and their use in inferring an examinee's ability , 1966 .

[5]  G. Masters A rasch model for partial credit scoring , 1982 .

[6]  Herman Rubin,et al.  Statistical Inference in Factor Analysis , 1956 .

[7]  Yasumasa Fujisaki,et al.  A stopping rule for stochastic approximation , 2015, Autom..

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Zhiliang Ying,et al.  Latent Variable Selection for Multidimensional Item Response Theory Models via $$L_{1}$$L1 Regularization , 2016 .

[10]  Sik-Yum Lee,et al.  A MULTIVARIATE PROBIT LATENT VARIABLE MODEL FOR ANALYZING DICHOTOMOUS RESPONSES , 2005 .

[11]  H. Robbins A Stochastic Approximation Method , 1951 .

[12]  Tim Hesterberg,et al.  Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control , 2004, Technometrics.

[13]  S. Nielsen The stochastic EM algorithm: estimation and asymptotic results , 2000 .

[14]  R. D. Bock,et al.  Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm , 1981 .

[15]  John A. Johnson Measuring thirty facets of the Five Factor Model with a 120-item public domain inventory: Development of the IPIP-NEO-120 , 2014 .

[16]  A. Béguin,et al.  MCMC estimation and some model-fit analysis of multidimensional IRT models , 2001 .

[17]  R. Philip Chalmers,et al.  mirt: A Multidimensional Item Response Theory Package for the R Environment , 2012 .

[18]  É. Moulines,et al.  Convergence of a stochastic approximation version of the EM algorithm , 1999 .

[19]  Li Cai,et al.  Metropolis-Hastings Robbins-Monro Algorithm for Confirmatory Item Factor Analysis , 2010 .

[20]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[21]  J. Revuelta Multidimensional Item Response Model for Nominal Variables , 2014 .

[22]  H. Joe,et al.  Composite likelihood estimation in multivariate data analysis , 2005 .

[23]  Matthias von Davier New Results on an Improved Parallel EM Algorithm for Estimating Generalized Latent Variable Models , 2016 .

[24]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[25]  John A. Johnson Ascertaining the validity of individual protocols from Web-based personality inventories. , 2005 .

[26]  S. Rabe-Hesketh,et al.  Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects , 2005 .

[27]  M. Reckase Multidimensional Item Response Theory , 2009 .

[28]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[29]  Elvezio Ronchetti,et al.  Estimation of generalized linear latent variable models , 2004 .

[30]  Haruhiko Ogasawara Marginal Maximum Likelihood Estimation of Item Response Theory (IRT) Equating Coefficients for the Common-examinee Design , 2001 .

[31]  Yang Liu,et al.  Multidimensional Item Response Theory , 2018 .

[32]  James E. Carlson,et al.  Full-Information Factor Analysis for Polytomous Item Responses , 1995 .

[33]  S. Walker Invited comment on the paper "Slice Sampling" by Radford Neal , 2003 .

[34]  Edward H. Ip,et al.  Stochastic EM: method and application , 1996 .

[35]  조용현,et al.  Stochastic Approximation Algorithm을 이용한 최적화용 신경회로망의 성능 개선 , 1992 .

[36]  Li Cai,et al.  HIGH-DIMENSIONAL EXPLORATORY ITEM FACTOR ANALYSIS BY A METROPOLIS–HASTINGS ROBBINS–MONRO ALGORITHM , 2010 .

[37]  Xiao-Li Meng,et al.  Fitting Full-Information Item Factor Models and an Empirical Investigation of Bridge Sampling , 1996 .

[38]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[39]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[40]  N. Thomas,et al.  Asymptotic Corrections for Multivariate Posterior Moments with Factored Likelihood Functions , 1993 .

[41]  By W. R. GILKSt,et al.  Adaptive Rejection Sampling for Gibbs Sampling , 2010 .

[42]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[43]  B. Muthén Contributions to factor analysis of dichotomous variables , 1978 .

[44]  R. Kass,et al.  Approximate Bayesian Inference in Conditionally Independent Hierarchical Models (Parametric Empirical Bayes Models) , 1989 .

[45]  Jean-Paul Fox,et al.  An Aggregate IRT Procedure for Exploratory Factor Analysis , 2015 .

[46]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[47]  J. Albert Bayesian Estimation of Normal Ogive Item Response Curves Using Gibbs Sampling , 1992 .

[48]  T. Louis Finding the Observed Information Matrix When Using the EM Algorithm , 1982 .

[49]  Jian Qing Shi,et al.  Bayesian sampling‐based approach for factor analysis models with continuous and polytomous data , 1998 .

[50]  K. Jöreskog,et al.  Factor Analysis of Ordinal Variables: A Comparison of Three Approaches , 2001, Multivariate behavioral research.

[51]  Piers Steel,et al.  Refining the relationship between personality and subjective well-being. , 2008, Psychological bulletin.

[52]  Edward H. Ip,et al.  On Single Versus Multiple Imputation for a Class of Stochastic Algorithms Estimating Maximum Likelihood , 2002, Comput. Stat..

[53]  Richard J. Patz,et al.  A Straightforward Approach to Markov Chain Monte Carlo Methods for Item Response Models , 1999 .

[54]  Yang Liu A Riemannian Optimization Algorithm for Joint Maximum Likelihood Estimation of High-Dimensional Exploratory Item Factor Analysis , 2020, Psychometrika.

[55]  J-P Fox Stochastic EM for estimating the parameters of a multilevel IRT model. , 2003, The British journal of mathematical and statistical psychology.

[56]  F. Kong,et al.  A stochastic approximation algorithm with Markov chain Monte-carlo method for incomplete data estimation problems. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[57]  Stochastic Approximation Methods for Latent Regression Item Response Models , 2010 .

[58]  R. D. Bock,et al.  High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature , 2005 .

[59]  Brian W. Junker,et al.  Applications and Extensions of MCMC in IRT: Multiple Item Types, Missing Data, and Rated Responses , 1999 .

[60]  Lihua Yao,et al.  A Multidimensional Partial Credit Model With Associated Item and Test Statistics: An Application to Mixed-Format Tests , 2006 .

[61]  Kristopher J Preacher,et al.  Item factor analysis: current approaches and future directions. , 2007, Psychological methods.

[62]  Sylvia Richardson,et al.  Markov chain concepts related to sampling algorithms , 1995 .

[63]  John Geweke,et al.  Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments , 1991 .

[64]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[65]  A. Agresti An introduction to categorical data analysis , 1997 .

[66]  R. Darrell Bock,et al.  Estimating item parameters and latent ability when responses are scored in two or more nominal categories , 1972 .

[67]  D. M. Hutton,et al.  The Art of Multiprocessor Programming , 2008 .

[68]  Michael C. Edwards,et al.  A Markov Chain Monte Carlo Approach to Confirmatory Item Factor Analysis , 2010 .

[69]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .