Principal Component Analysis for Sparse High-Dimensional Data

Principal component analysis (PCA) is a widely used technique for data analysis and dimensionality reduction. Eigenvalue decomposition is the standard algorithm for solving PCA, but a number of other algorithms have been proposed. For instance, the EM algorithm is much more efficient in case of high dimensionality and a small number of principal components. We study a case where the data are high-dimensional and a majority of the values are missing. In this case, both of these algorithms turn out to be inadequate. We propose using a gradient descent algorithm inspired by Oja's rule, and speeding it up by an approximate Newton's method. The computational complexity of the proposed method is linear with respect to the number of observed values in the data and to the number of principal components. In the experiments with Netflix data, the proposed algorithm is about ten times faster than any of the four comparison methods.

[1]  R. Manne,et al.  Missing values in principal component analysis , 1998 .

[2]  Juha Karhunen,et al.  Principal Component Analysis for Large Scale Problems with Lots of Missing Values , 2007, ECML.

[3]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[4]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[5]  Charles M. Bishop Variational principal components , 1999 .

[6]  Sun-Yuan Kung,et al.  Principal Component Neural Networks: Theory and Applications , 1996 .

[7]  Sam T. Roweis,et al.  EM Algorithms for PCA and SPCA , 1997, NIPS.

[8]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[9]  Joost N. Kok Machine Learning: ECML 2007, 18th European Conference on Machine Learning, Warsaw, Poland, September 17-21, 2007, Proceedings , 2007, ECML.

[10]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[11]  Andrzej Cichocki,et al.  Adaptive Blind Signal and Image Processing - Learning Algorithms and Applications , 2002 .

[12]  Erkki Oja,et al.  Subspace methods of pattern recognition , 1983 .

[13]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[14]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[15]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[16]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.