Generalized Principal Component Analysis: Projection of Saturated Model Parameters

Abstract Principal component analysis (PCA) is very useful for a wide variety of data analysis tasks, but its implicit connection to the Gaussian distribution can be undesirable for discrete data such as binary and multi-category responses or counts. We generalize PCA to handle various types of data using the generalized linear model framework. In contrast to the existing approach of matrix factorizations for exponential family data, our generalized PCA provides low-rank estimates of the natural parameters by projecting the saturated model parameters. This difference in formulation leads to the favorable properties that the number of parameters does not grow with the sample size and simple matrix multiplication suffices for computation of the principal component scores on new data. A practical algorithm which can incorporate missing data and case weights is developed for finding the projection matrix. Examples on simulated and real count data show the improvement of generalized PCA over standard PCA for matrix completion, visualization, and collaborative filtering. Supplementary material for this article is available online.

[1]  Geoffrey J. Gordon,et al.  A Unified View of Matrix Factorization Models , 2008, ECML/PKDD.

[2]  R. Tibshirani,et al.  Selecting the number of principal components: estimation of the true rank of a noisy matrix , 2014, 1410.8260.

[3]  K. Fan On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations I. , 1949, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Jianhua Z. Huang,et al.  SPARSE LOGISTIC PRINCIPAL COMPONENTS ANALYSIS FOR BINARY DATA. , 2010, The annals of applied statistics.

[5]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[6]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[7]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[8]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[9]  Katherine A. Heller,et al.  Bayesian Exponential Family PCA , 2008, NIPS.

[10]  D. Hunter,et al.  A Tutorial on MM Algorithms , 2004 .

[11]  R. Clarke,et al.  Theory and Applications of Correspondence Analysis , 1985 .

[12]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[13]  Jon C. Dattorro,et al.  Convex Optimization & Euclidean Distance Geometry , 2004 .

[14]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[15]  Yoonkyung Lee,et al.  Dimensionality reduction for binary data through the projection of natural parameters , 2015, J. Multivar. Anal..

[16]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[17]  Jing Lei,et al.  Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA , 2013, NIPS.

[18]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[19]  Michael E. Tipping Probabilistic Visualisation of High-Dimensional Binary Data , 1998, NIPS.

[20]  Lydia T. Liu,et al.  $e$PCA: High dimensional exponential family PCA , 2016, The Annals of Applied Statistics.

[21]  K. Fan On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations: II. , 1949, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[23]  Stephen P. Boyd,et al.  Subgradient Methods , 2007 .

[24]  D. Bartholomew Latent Variable Models And Factor Analysis , 1987 .

[25]  Yu He,et al.  Statistical Significance of the Netflix Challenge , 2012, 1207.5649.

[26]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[27]  Stephen P. Boyd,et al.  Generalized Low Rank Models , 2014, Found. Trends Mach. Learn..

[28]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[29]  Wray L. Buntine Variational Extensions to EM and Multinomial PCA , 2002, ECML.

[30]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[31]  Geoffrey J. Gordon Generalized² Linear² Models , 2003, NIPS 2003.

[32]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[33]  Geoffrey J. Gordon Generalized2 Linear2 Models , 2002, NIPS.

[34]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[35]  Qiang Yang,et al.  One-Class Collaborative Filtering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[36]  Geoffrey J. Gordon Generalized^2 Linear^2 Models , 2002, NIPS 2002.

[37]  Wotao Yin,et al.  A feasible method for optimization with orthogonality constraints , 2013, Math. Program..

[38]  Lawrence K. Saul,et al.  A Generalized Linear Model for Principal Component Analysis of Binary Data , 2003, AISTATS.

[39]  Victor Y. Pan,et al.  The complexity of the matrix eigenproblem , 1999, STOC '99.

[40]  Christopher C. Johnson Logistic Matrix Factorization for Implicit Feedback Data , 2014 .

[41]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .