Determining the Number of Factors in High-Dimensional Generalized Latent Factor Models

As a generalization of the classical linear factor model, generalized latent factor models are useful for analyzing multivariate data of different types, including binary choices and counts. This paper proposes an information criterion to determine the number of factors in generalized latent factor models. The consistency of the proposed information criterion is established under a high-dimensional setting where both the sample size and the number of manifest variables grow to infinity, and data may have many missing values. An error bound is established for the parameter estimates, which plays an important role in establishing the consistency of the proposed information criterion. This error bound improves several existing results and may be of independent theoretical interest. We evaluate the proposed method by a simulation study and an application to Eysenck’s personality questionnaire.

[1]  Ewout van den Berg,et al.  1-Bit Matrix Completion , 2012, ArXiv.

[2]  E. Gilbert A comparison of signalling alphabets , 1952 .

[3]  Roger A. Horn,et al.  Norm bounds for Hadamard products and an arithmetic - geometric mean inequality for unitarily invariant norms , 1995 .

[4]  M. Reckase Multidimensional Item Response Theory , 2009 .

[5]  Robert Tibshirani,et al.  Main Effects and Interactions in Mixed and Incomplete Data Frames , 2018, Journal of the American Statistical Association.

[6]  Adel Javanmard,et al.  1-bit matrix completion under exact low-rank constraint , 2015, 2015 49th Annual Conference on Information Sciences and Systems (CISS).

[7]  R. Philip Chalmers,et al.  mirt: A Multidimensional Item Response Theory Package for the R Environment , 2012 .

[8]  Sybil B. G. Eysenck,et al.  The Eysenck Personality Questionnaire: an examination of the factorial similarity of P, E, N, and L across 34 countries , 1998 .

[9]  Yunxiao Chen,et al.  Structured Latent Factor Analysis for Large-scale Data: Identifiability, Estimability, and Their Implications , 2017, Journal of the American Statistical Association.

[10]  R. D. Bock,et al.  Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm , 1981 .

[11]  R. Cattell The Scree Test For The Number Of Factors. , 1966, Multivariate behavioral research.

[12]  E. Ferguson,et al.  The P-psychopathy continuum: Facets of Psychoticism and their associations with psychopathic tendencies , 2013 .

[13]  A. B. Owen,et al.  Bi-cross-validation for factor analysis , 2015, 1503.03515.

[14]  Quanquan Gu,et al.  Optimal Statistical and Computational Rates for One Bit Matrix Completion , 2016, AISTATS.

[15]  Lydia T. Liu,et al.  $e$PCA: High dimensional exponential family PCA , 2016, The Annals of Applied Statistics.

[16]  David J. Bartholomew,et al.  Latent Variable Models and Factor Analysis: A Unified Approach , 2011 .

[17]  A. Buja,et al.  Remarks on Parallel Analysis. , 1992, Multivariate behavioral research.

[18]  P. Barrett,et al.  Hierarchical Structure of the Eysenck Personality Inventory in a Large Population Sample: Goldberg's Trait-Tier Mapping Procedure. , 2013, Personality and individual differences.

[19]  R. D. Bock,et al.  Marginal maximum likelihood estimation of item parameters , 1982 .

[20]  Seung C. Ahn,et al.  Eigenvalue Ratio Test for the Number of Factors , 2013 .

[21]  Wen-Xin Zhou,et al.  A max-norm constrained minimization approach to 1-bit matrix completion , 2013, J. Mach. Learn. Res..

[22]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[23]  S. Chatterjee,et al.  Matrix estimation by Universal Singular Value Thresholding , 2012, 1212.1247.

[24]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[25]  A. Onatski Determining the Number of Factors from Empirical Distribution of Eigenvalues , 2010, The Review of Economics and Statistics.

[26]  Yunxiao Chen,et al.  Joint Maximum Likelihood Estimation for High-Dimensional Exploratory Item Factor Analysis , 2018, Psychometrika.

[27]  J. Shao AN ASYMPTOTIC THEORY FOR LINEAR MODEL SELECTION , 1997 .

[28]  A. Bandeira,et al.  Sharp nonasymptotic bounds on the norm of random matrices with independent entries , 2014, 1408.6185.

[29]  M. Browne An Overview of Analytic Rotation in Exploratory Factor Analysis , 2001 .

[30]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[31]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[32]  Katie Witkiewitz,et al.  Reliability of Scores from the Eysenck Personality Questionnaire: A Reliability Generalization Study , 2001 .

[33]  J. Bai,et al.  Determining the Number of Factors in Approximate Factor Models , 2000 .

[34]  In Choi,et al.  Model selection for factor analysis: Some new criteria and performance comparisons , 2019 .

[35]  H. Kaiser The Application of Electronic Computers to Factor Analysis , 1960 .

[36]  Kwok Pui Choi,et al.  Consistency of AIC and BIC in estimating the number of significant components in high-dimensional principal component analysis , 2018, The Annals of Statistics.

[37]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[38]  P. Fayers Item Response Theory for Psychologists , 2004, Quality of Life Research.

[39]  Stan Lipovetsky,et al.  Generalized Latent Variable Modeling: Multilevel,Longitudinal, and Structural Equation Models , 2005, Technometrics.

[40]  Yunxiao Chen,et al.  A Note on Exploratory Item Factor Analysis by Singular Value Decomposition , 2019, Psychometrika.

[41]  H. Akaike A new look at the statistical model identification , 1974 .

[42]  Michel Wedel,et al.  Factor Models for Multivariate Count Data , 2003 .

[43]  Julie Josse,et al.  Low-rank model with covariates for count data with missing values , 2019, J. Multivar. Anal..

[44]  Edgar Dobriban,et al.  Deterministic parallel analysis: an improved method for selecting factors and principal components , 2017, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[45]  Xiaotong Shen,et al.  Personalized Prediction and Sparsity Pursuit in Latent Factor Models , 2016 .

[46]  J. Horn A rationale and test for the number of factors in factor analysis , 1965, Psychometrika.

[47]  Elvezio Ronchetti,et al.  Estimation of generalized linear latent variable models , 2004 .