Tkk Reports in Information and Computer Science Practical Approaches to Principal Component Analysis in the Presence of Missing Values Tkk Reports in Information and Computer Science Practical Approaches to Principal Component Analysis in the Presence of Missing Values

Principal component analysis (PCA) is a classical data analysis technique that finds linear transformations of data that retain the maximal amount of variance. We study a case where some of the data values are missing, and show that this problem has many features which are usually associated with nonlinear models, such as overfitting and bad locally optimal solutions. A probabilistic formulation of PCA provides a good foundation for handling missing values, and we provide formulas for doing that. In case of high dimensional and very sparse data, overfitting becomes a severe problem and traditional algorithms for PCA are very slow. We introduce a novel fast algorithm and extend it to variational Bayesian learning. Different versions of PCA are compared in artificial experiments, demonstrating the effects of regularization and modeling of posterior variance. The scalability of the proposed algorithm is demonstrated by applying it to the Netflix problem.

[1]  C. S. Wallace,et al.  Classification by Minimum-Message-Length Inference , 1991, ICCI.

[2]  Michael I. Jordan,et al.  Learning from Incomplete Data , 1994 .

[3]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[4]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .

[5]  Charles M. Bishop Variational principal components , 1999 .

[6]  Thomas Hofmann,et al.  Latent semantic models for collaborative filtering , 2004, TOIS.

[7]  Sam T. Roweis,et al.  EM Algorithms for PCA and SPCA , 1997, NIPS.

[8]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[9]  Arkadiusz Paterek,et al.  Improving regularized singular value decomposition for collaborative filtering , 2007 .

[10]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[11]  W. Boscardin,et al.  Modeling the Covariance and Correlation Matrix of Repeated Measures , 2005 .

[12]  G. Young Maximum likelihood estimation and factor analysis , 1941 .

[13]  Sun-Yuan Kung,et al.  Principal Component Neural Networks: Theory and Applications , 1996 .

[14]  J. C. van Houwelingen,et al.  An Application of Factor Analysis With Missing Data , 1981 .

[15]  Yehuda Koren,et al.  Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[16]  Tapani Raiko,et al.  Missing Values in Nonlinear Factor Analysis , 2001 .

[17]  Yew Jin Lim Variational Bayesian Approach to Movie Rating Prediction , 2007 .

[18]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[19]  Alexander Ilin,et al.  On the Effect of the Form of the Posterior Approximation in Variational Learning of ICA Models , 2005, Neural Processing Letters.

[20]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[21]  Juha Karhunen,et al.  Principal Component Analysis for Large Scale Problems with Lots of Missing Values , 2007, ECML.

[22]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[23]  T. W. Anderson Maximum Likelihood Estimates for a Multivariate Normal Distribution when Some Observations are Missing , 1957 .

[24]  M. Benno Blumenthal,et al.  Reduced space optimal analysis for historical data sets: 136 years of Atlantic sea surface temperatures , 1997 .

[25]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[26]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[27]  R. Manne,et al.  Missing values in principal component analysis , 1998 .

[28]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[29]  Juha Karhunen,et al.  Building Blocks for Variational Bayesian Learning of Latent Variable Models , 2007, J. Mach. Learn. Res..

[30]  Juha Karhunen,et al.  Principal Component Analysis for Sparse High-Dimensional Data , 2007, ICONIP.

[31]  Alexander Ilin,et al.  Transformations in variational Bayesian factor analysis to speed up learning , 2010, Neurocomputing.

[32]  Peter D. Hoff,et al.  Model Averaging and Dimension Selection for the Singular Value Decomposition , 2006, math/0609042.

[33]  Erkki Oja,et al.  Subspace methods of pattern recognition , 1983 .

[34]  Ruslan Salakhutdinov,et al.  Bayesian probabilistic matrix factorization using Markov chain Monte Carlo , 2008, ICML '08.

[35]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[36]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[37]  Yehuda Koren,et al.  The BellKor solution to the Netflix Prize , 2007 .

[38]  E. Oja,et al.  Independent Component Analysis , 2013 .

[39]  Andrzej Cichocki,et al.  Adaptive Blind Signal and Image Processing - Learning Algorithms and Applications , 2002 .

[40]  Juha Karhunen,et al.  Accelerating Cyclic Update Algorithms for Parameter Estimation by Pattern Searches , 2003, Neural Processing Letters.

[41]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[42]  Tom Minka,et al.  Automatic Choice of Dimensionality for PCA , 2000, NIPS.