Vintage Factor Analysis with Varimax Performs Statistical Inference

Psychologists developed Multiple Factor Analysis to decompose multivariate data into a small number of interpretable factors without any a priori knowledge about those factors. In this form of factor analysis, the Varimax "factor rotation" is a key step to make the factors interpretable. Charles Spearman and many others objected to factor rotations because the factors seem to be rotationally invariant. These objections are still reported in all contemporary multivariate statistics textbooks. This is an engima because this vintage form of factor analysis has survived and is widely popular because, empirically, the factor rotation often makes the factors easier to interpret. We argue that the rotation makes the factors easier to interpret because, in fact, the Varimax factor rotation performs statistical inference. We show that Principal Components Analysis (PCA) with the Varimax rotation provides a unified spectral estimation strategy for a broad class of modern factor models, including the Stochastic Blockmodel and a natural variation of Latent Dirichlet Allocation (i.e., "topic modeling"). In addition, we show that Thurstone's widely employed sparsity diagnostics implicitly assess a key "leptokurtic" condition that makes the rotation statistically identifiable in these models. Taken together, this shows that the know-how of Vintage Factor Analysis performs statistical inference, reversing nearly a century of statistical thinking on the topic. With a sparse eigensolver, PCA with Varimax is both fast and stable. Combined with Thurstone's straightforward diagnostics, this vintage approach is suitable for a wide array of modern applications.

[1]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[3]  James O. Ramsay,et al.  Applied Functional Data Analysis: Methods and Case Studies , 2002 .

[4]  K. Nordhausen,et al.  Fourth Moments and Independent Component Analysis , 2014, 1406.4765.

[5]  M. Yuan,et al.  Independent component analysis via nonparametric maximum likelihood estimation , 2012, 1206.0457.

[6]  Fan Chung Graham,et al.  Spectral Clustering of Graphs with General Degrees in the Extended Planted Partition Model , 2012, COLT.

[7]  R. C. Durfee,et al.  MULTIPLE FACTOR ANALYSIS. , 1967 .

[8]  Yilin Zhang,et al.  Understanding Regularized Spectral Clustering via Graph Conductance , 2018, NeurIPS.

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  J. Berge,et al.  A joint treatment of varimax rotation and the problem of diagonalizing symmetric matrices simultaneously in the least-squares sense , 1984 .

[11]  David J. Bartholomew,et al.  Latent Variable Models and Factor Analysis: A Unified Approach , 2011 .

[12]  Yuan Zhang,et al.  Detecting Overlapping Communities in Networks Using Spectral Methods , 2014, SIAM J. Math. Data Sci..

[13]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[14]  Karl Rohe,et al.  Discussion of “Coauthorship and citation networks for statisticians” , 2016 .

[15]  Purnamrita Sarkar,et al.  Estimating Mixed Memberships With Sharp Eigenvector Deviations , 2017, Journal of the American Statistical Association.

[16]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[17]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, NIPS 2004.

[18]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[19]  C. Priebe,et al.  The two-to-infinity norm and singular subspace geometry with applications to high-dimensional statistics , 2017, The Annals of Statistics.

[20]  Bin Yu,et al.  Co-clustering directed graphs to discover asymmetries and directional communities , 2016, Proceedings of the National Academy of Sciences.

[21]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[22]  Tianwen Wei,et al.  A Convergence and Asymptotic Analysis of the Generalized Symmetric FastICA Algorithm , 2014, IEEE Transactions on Signal Processing.

[23]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[24]  L. L. Thurstone,et al.  The Vectors of Mind Multiple Factor Analysis for the Isolation of Primary Traits , 2017 .

[25]  J. Riordan Moment Recurrence Relations for Binomial, Poisson and Hypergeometric Frequency Distributions , 1937 .

[26]  H. Kaiser The varimax criterion for analytic rotation in factor analysis , 1958 .

[27]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[28]  H. Yau,et al.  Spectral statistics of Erdős–Rényi graphs I: Local semicircle law , 2011, 1103.1919.

[29]  Hans-Georg Müller,et al.  Functional Data Analysis , 2016 .

[30]  Purnamrita Sarkar,et al.  Overlapping Clustering Models, and One (class) SVM to Bind Them All , 2018, NeurIPS.

[31]  P. Bickel,et al.  A nonparametric view of network models and Newman–Girvan and other modularities , 2009, Proceedings of the National Academy of Sciences.

[32]  C. Priebe,et al.  Signal‐plus‐noise matrix models: eigenvector deviations and fluctuations , 2018, Biometrika.

[33]  P. Bickel,et al.  Consistent independent component analysis and prewhitening , 2005, IEEE Transactions on Signal Processing.

[34]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[35]  Nickolay T. Trendafilov,et al.  ORTHOMAX ROTATION PROBLEM. A DIFFERENTIAL EQUATION APPROACH , 1998 .

[36]  Jiashun Jin,et al.  A Sharp Lower Bound for Mixed-membership Estimation , 2017 .

[37]  Anna Maria Fiori,et al.  Karl Pearson and the Origin of Kurtosis , 2009 .

[38]  Herman Rubin,et al.  Statistical Inference in Factor Analysis , 1956 .

[39]  P. Latouche,et al.  Overlapping stochastic block models with application to the French political blogosphere , 2009, 0910.2098.

[40]  Heinz Neudecker,et al.  On the matrix formulation of Kaiser's varimax criterion , 1981 .

[41]  P. Bickel,et al.  Efficient independent component analysis , 2006, 0705.4230.

[42]  R. J. Sherin,et al.  A matrix formulation of Kaiser's varimax criterion , 1966, Psychometrika.

[43]  Peter J. Bickel,et al.  Pseudo-likelihood methods for community detection in large sparse networks , 2012, 1207.2340.

[44]  Vincent Q. Vu,et al.  MINIMAX SPARSE PRINCIPAL SUBSPACE ESTIMATION IN HIGH DIMENSIONS , 2012, 1211.0373.

[45]  James Clerk Maxwell,et al.  V. Illustrations of the dynamical theory of gases.—Part I. On the motions and collisions of perfectly elastic spheres , 1860 .