Probabilistic Principal Component Analysis

Principal component analysis (PCA) is a ubiquitous technique for data analysis and processing, but one which is not based on a probability model. We demonstrate how the principal axes of a set of observed data vectors may be determined through maximum likelihood estimation of parameters in a latent variable model that is closely related to factor analysis. We consider the properties of the associated likelihood function, giving an EM algorithm for estimating the principal subspace iteratively, and discuss, with illustrative examples, the advantages conveyed by this probabilistic approach to PCA.

[1]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[2]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[3]  G. Young Maximum likelihood estimation and factor analysis , 1941 .

[4]  R. Courant,et al.  Methods of Mathematical Physics , 1962 .

[5]  R. Courant,et al.  Methods of Mathematical Physics, Vol. I , 1954 .

[6]  Herman Rubin,et al.  Statistical Inference in Factor Analysis , 1956 .

[7]  T. W. Anderson ASYMPTOTIC THEORY FOR PRINCIPAL COMPONENT ANALYSIS , 1963 .

[8]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[9]  G. Reuter LINEAR OPERATORS PART II (SPECTRAL THEORY) , 1969 .

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[12]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[13]  George Henry Dunteman,et al.  Introduction To Multivariate Analysis , 1984 .

[14]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[15]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[16]  Lawrence Sirovich,et al.  Application of the Karhunen-Loeve Procedure for the Characterization of Human Faces , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[18]  Stephen M. Omohundro,et al.  Surface Learning with Applications to Lipreading , 1993, NIPS.

[19]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[20]  R. Shanmugam Multivariate Analysis: Part 1: Distributions, Ordination and Inference , 1994 .

[21]  Alexander Basilevsky,et al.  Statistical Factor Analysis and Related Methods , 1994 .

[22]  Bernhard Schölkopf,et al.  Extracting Support Data for a Given Task , 1995, KDD.

[23]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[24]  Simon Haykin,et al.  Optimally adaptive transform coding , 1995, IEEE Trans. Image Process..

[25]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.

[26]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[27]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[28]  R. Shanmugam Multivariate Analysis: Part 2: Classification, Covariance Structures and Repeated Measurements , 1998 .

[29]  Christopher M. Bishop,et al.  A Hierarchical Latent Variable Model for Data Visualization , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  George W. Irwin,et al.  RBF principal manifolds for process monitoring , 1999, IEEE Trans. Neural Networks.

[31]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[32]  E. Bagarinao,et al.  Reconstructing bifurcation diagrams from noisy time series using nonlinear autoregressive models. , 1999, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[33]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Andreas Ziehe,et al.  Artifact Reduction in Magnetoneurography Based on Time-Delayed Second Order Correlations , 1998 .

[35]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[36]  Martin Brown,et al.  Linear spectral mixture models and support vector machines for remote sensing , 2000, IEEE Trans. Geosci. Remote. Sens..

[37]  S. Park,et al.  Texture classification with kernel principal component analysis , 2000 .

[38]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[39]  Ravi Kothari,et al.  Bayes-optimality motivated linear and multilayered perceptron-based dimensionality reduction , 2000, IEEE Trans. Neural Networks Learn. Syst..

[40]  Andreas Ziehe,et al.  Independent component analysis of noninvasively recorded cortical magnetic DC-fields in humans , 2000, IEEE Transactions on Biomedical Engineering.

[41]  Azriel Rosenfeld,et al.  Pattern recognition: Historical perspective and future directions , 2000 .

[42]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[43]  H. J. Kim,et al.  Kernel principal component analysis for texture classification , 2001, IEEE Signal Processing Letters.

[44]  Fernando Pérez-Cruz,et al.  Weighted least squares training of support vector classifiers leading to compact and adaptive schemes , 2001, IEEE Trans. Neural Networks.

[45]  Roman Rosipal,et al.  An Expectation-Maximization Approach to Nonlinear Component Analysis , 2001, Neural Computation.

[46]  Johan A. K. Suykens,et al.  Financial time series prediction using least squares support vector machines within the evidence framework , 2001, IEEE Trans. Neural Networks.

[47]  Pedro E. López-de-Teruel,et al.  Nonlinear kernel-based statistical pattern analysis , 2001, IEEE Trans. Neural Networks.

[48]  L Pecora,et al.  Early Seizure Detection , 2001, Journal of clinical neurophysiology : official publication of the American Electroencephalographic Society.

[49]  Stan Lipovetsky,et al.  Latent Variable Models and Factor Analysis , 2001, Technometrics.

[50]  Koji Tsuda The subspace method in Hilbert space , 2001, Systems and Computers in Japan.

[51]  Bernhard Schölkopf,et al.  Generalization Performance of Regularization Networks and Support Vector Machines via Entropy Numbers of Compact Operators , 1998 .