Learning a factor model via regularized PCA

We consider the problem of learning a linear factor model. We propose a regularized form of principal component analysis (PCA) and demonstrate through experiments with synthetic and real data the superiority of resulting estimates to those produced by pre-existing factor analysis approaches. We also establish theoretical results that explain how our algorithm corrects the biases induced by conventional approaches. An important feature of our algorithm is that its computational requirements are similar to those of PCA, which enjoys wide use in large part due to its efficiency.

[1]  H. Harman Modern factor analysis , 1961 .

[2]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[3]  H. Akaike Factor analysis and AIC , 1987 .

[4]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[5]  Christopher M. Bishop,et al.  Bayesian PCA , 1998, NIPS.

[6]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[7]  Tom Minka,et al.  Automatic Choice of Dimensionality for PCA , 2000, NIPS.

[8]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[9]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[10]  I. Jolliffe,et al.  A Modified Principal Component Technique Based on the LASSO , 2003 .

[11]  P. Rousseeuw,et al.  Robust factor analysis , 2003 .

[12]  J. W. Silverstein,et al.  Eigenvalues of large sample covariance matrices of spiked population models , 2004, math/0408165.

[13]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[14]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[15]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[16]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Maximum Likelihood Estimation , 2007, ArXiv.

[17]  D. Paul ASYMPTOTICS OF SAMPLE EIGENSTRUCTURE FOR A LARGE DIMENSIONAL SPIKED COVARIANCE MODEL , 2007 .

[18]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[19]  Bin Yu,et al.  High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence , 2008, 0811.3628.

[20]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[21]  M. Wainwright,et al.  High-dimensional analysis of semidefinite relaxations for sparse principal components , 2008, 2008 IEEE International Symposium on Information Theory.

[22]  I. Johnstone,et al.  Sparse Principal Components Analysis , 2009, 0901.4392.

[23]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[24]  M. Wainwright,et al.  HIGH-DIMENSIONAL COVARIANCE ESTIMATION BY MINIMIZING l1-PENALIZED LOG-DETERMINANT DIVERGENCE BY PRADEEP RAVIKUMAR , 2009 .

[25]  Victor Vianu,et al.  Invited articles section foreword , 2010, JACM.

[26]  Shie Mannor,et al.  Principal Component Analysis with Contaminated Data: The High Dimensional Case , 2010, COLT 2010.

[27]  Pablo A. Parrilo,et al.  Latent variable graphical model selection via convex optimization , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[28]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[29]  M. Pourahmadi Covariance Estimation: The GLM and Regularization Perspectives , 2011, 1202.1661.

[30]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[31]  Mohsen Pourahmadi,et al.  High-Dimensional Covariance Estimation , 2013 .