Unifying linear dimensionality reduction

Linear dimensionality reduction methods are a cornerstone of analyzing high dimensional data, due to their simple geometric interpretations and typically attractive computational properties. These methods capture many data features of interest, such as covariance, dynamical structure, correlation between data sets, input-output relationships, and margin between data classes. Methods have been developed with a variety of names and motivations in many fields, and perhaps as a result the connections between all these methods have not been highlighted. Here we survey methods from this disparate literature as optimization programs over matrix manifolds. We discuss principal component analysis, factor analysis, linear multidimensional scaling, Fisher's linear discriminant analysis, canonical correlations analysis, maximum autocorrelation factors, slow feature analysis, sufficient dimensionality reduction, undercomplete independent component analysis, linear regression, distance metric learning, and more. This optimization framework gives insight to some rarely discussed shortcomings of well-known methods, such as the suboptimality of certain eigenvector solutions. Modern techniques for optimization over matrix manifolds enable a generic linear dimensionality reduction solver, which accepts as input data and an objective to be optimized, and returns, as output, an optimal low-dimensional projection of the data. This simple optimization framework further allows straightforward generalizations and novel variants of classical methods, which we demonstrate here by creating an orthogonal-projection canonical correlations analysis. More broadly, this survey and generic solver suggest that linear dimensionality reduction can move toward becoming a blackbox, objective-agnostic numerical technology.

[1]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[2]  Wm. R. Wright General Intelligence, Objectively Determined and Measured. , 1905 .

[3]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[4]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[5]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[6]  C. R. Rao,et al.  The Utilization of Multiple Measurements in Problems of Biological Classification , 1948 .

[7]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[8]  A. Hoffman,et al.  Some metric inequalities in the space of matrices , 1955 .

[9]  L. Mirsky SYMMETRIC GAUGE FUNCTIONS AND UNITARILY INVARIANT NORMS , 1960 .

[10]  P. Schönemann,et al.  A generalized solution of the orthogonal procrustes problem , 1966 .

[11]  D. Luenberger The Gradient Projection Method Along Geodesics , 1972 .

[12]  D. Bertsekas On the Goldstein-Levitin-Polyak gradient projection method , 1974, CDC 1974.

[13]  C. Theobald An inequality with application to multivariate analysis , 1975 .

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  D. Gabay Minimizing a differentiable function over a differential manifold , 1982 .

[16]  D. Freedman,et al.  Asymptotics of Graphical Projection Pursuit , 1984 .

[17]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[18]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[19]  D. Hawkins,et al.  Methods of L1 estimation of a covariance matrix , 1987 .

[20]  Axel Ruhe Closest normal matrix finally found! , 1987 .

[21]  N. Higham MATRIX NEARNESS PROBLEMS AND APPLICATIONS , 1989 .

[22]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[23]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[24]  M. Stone Continuum regression: Cross-validated sequentially constructed prediction embracing ordinary least s , 1990 .

[25]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[26]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[27]  S. Klinke,et al.  Exploratory Projection Pursuit , 1995 .

[28]  P. Sabatier A L 1 -norm Pca and a Heuristic Approach , 1996 .

[29]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[30]  Sam T. Roweis,et al.  EM Algorithms for PCA and Sensible PCA , 1997, NIPS 1997.

[31]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[32]  John Porrill,et al.  Undercomplete Independent Component Analysis for Signal Separation and Dimension Reduction , 1997 .

[33]  H. Knutsson,et al.  A Unified Approach to PCA, PLS, MLR and CCA , 1997 .

[34]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[35]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, International Conference on Artificial Neural Networks.

[36]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[37]  Liqing Zhang,et al.  Natural gradient algorithm for blind separation of overdetermined mixture with additive noise , 1999, IEEE Signal Processing Letters.

[38]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[39]  Shun-ichi Amari,et al.  Natural Gradient Learning for Over- and Under-Complete Bases in ICA , 1999, Neural Computation.

[40]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[41]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[42]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[43]  Russell H. Lambert,et al.  OVERDETERMINED BLIND SOURCE SEPARATION: USING MORE SENSORS THAN SOURCE SIGNALS IN A NOISY MIXTURE , 2000 .

[44]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[45]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[46]  Wray L. Buntine Variational Extensions to EM and Multinomial PCA , 2002, ECML.

[47]  K. I. WilliamsDivision,et al.  Products of Gaussians and Probabilistic Minor Component Analysis , 2002 .

[48]  Dominique Martinez,et al.  Kernel-Based Extraction of Slow Features: Complex Cells Learn Disparity and Translation Invariance from Natural Images , 2002, NIPS.

[49]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[50]  R. Larsen Decomposition using maximum autocorrelation factors , 2002 .

[51]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[52]  Jonathan H. Manton,et al.  Optimization algorithms exploiting unitary constraints , 2002, IEEE Trans. Signal Process..

[53]  Josef Kittler,et al.  Texture Description by Independent Components , 2002, SSPR/SPR.

[54]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[55]  Max Welling,et al.  Extreme Components Analysis , 2003, NIPS.

[56]  Laurenz Wiskott,et al.  Slow Feature Analysis: A Theoretical Analysis of Optimal Free Responses , 2003, Neural Computation.

[57]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[58]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[59]  David E. Booth,et al.  Applied Multivariate Analysis , 2003, Technometrics.

[60]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[61]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[62]  Geoffrey E. Hinton,et al.  Probabilistic sequential independent components analysis , 2004, IEEE Transactions on Neural Networks.

[63]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[64]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[65]  Kilian Q. Weinberger,et al.  Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[66]  Bernhard Schölkopf,et al.  Multivariate Regression via Stiefel Manifold Constraints , 2004, DAGM-Symposium.

[67]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[68]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[69]  Christopher K. I. Williams On a Connection between Kernel PCA and Metric Multidimensional Scaling , 2004, Machine Learning.

[70]  Anuj Srivastava,et al.  Tools for application-driven linear dimension reduction , 2005, Neurocomputing.

[71]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[72]  Shotaro Akaho,et al.  Learning algorithms utilizing quasi-geodesic flows on the Stiefel manifold , 2005, Neurocomputing.

[73]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[74]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[75]  Simone G. O. Fiori Quasi-Geodesic Neural Learning Algorithms Over the Orthogonal Group: A Tutorial , 2005, J. Mach. Learn. Res..

[76]  B. Nadler,et al.  Diffusion maps, spectral clustering and reaction coordinates of dynamical systems , 2005, math/0503445.

[77]  Amir Globerson,et al.  Metric Learning by Collapsing Classes , 2005, NIPS.

[78]  Shuicheng Yan,et al.  Neighborhood preserving embedding , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[79]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[80]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[81]  Vartan Choulakian,et al.  L1-norm projection pursuit principal component analysis , 2006, Comput. Stat. Data Anal..

[82]  Rong Jin,et al.  Distance Metric Learning: A Comprehensive Survey , 2006 .

[83]  Shuicheng Yan,et al.  Trace Quotient Problems Revisited , 2006, ECCV.

[84]  Hans-Peter Kriegel,et al.  Supervised probabilistic principal component analysis , 2006, KDD '06.

[85]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[86]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[87]  Lorenzo Torresani,et al.  Large Margin Component Analysis , 2006, NIPS.

[88]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[89]  Deli Zhao,et al.  Laplacian PCA and Its Applications , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[90]  Liu Yang An Overview of Distance Metric Learning , 2007 .

[91]  Richard E. Turner,et al.  A Maximum-Likelihood Interpretation for Slow Feature Analysis , 2007, Neural Computation.

[92]  Hongdong Li,et al.  A Convex Programming Approach to the Trace Quotient Problem , 2007, ACCV.

[93]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[94]  Michael I. Jordan,et al.  Regression on manifolds using kernel dimension reduction , 2007, ICML '07.

[95]  Golub Gene H. Et.Al Matrix Computations, 3rd Edition , 2007 .

[96]  Samuel Kaski,et al.  Fast Semi-Supervised Discriminative Component Analysis , 2007, 2007 IEEE Workshop on Machine Learning for Signal Processing.

[97]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[98]  Alexandre d'Aspremont,et al.  Optimal Solutions for Sparse Principal Component Analysis , 2007, J. Mach. Learn. Res..

[99]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[100]  Visa Koivunen,et al.  Steepest Descent Algorithms for Optimization Under Unitary Matrix Constraint , 2008, IEEE Transactions on Signal Processing.

[101]  Katherine A. Heller,et al.  Bayesian Exponential Family PCA , 2008, NIPS.

[102]  Victor Solo,et al.  Sparse Variable PCA Using Geodesic Steepest Descent , 2008, IEEE Transactions on Signal Processing.

[103]  John Shawe-Taylor,et al.  Convergence analysis of kernel Canonical Correlation Analysis: theory and practice , 2008, Machine Learning.

[104]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[105]  R. Cook,et al.  Sufficient dimension reduction and prediction in regression , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[106]  Michael I. Jordan,et al.  Kernel dimension reduction in regression , 2009, 0908.1854.

[107]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[108]  Jieping Ye,et al.  A least squares formulation for a class of generalized eigenvalue problems in machine learning , 2009, ICML '09.

[109]  Samy Bengio,et al.  An Online Algorithm for Large Scale Image Similarity Learning , 2009, NIPS.

[110]  Levent Tunçel,et al.  Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[111]  Christopher J. C. Burges,et al.  Dimension Reduction: a Guided Tour , 2009 .

[112]  Michael I. Jordan,et al.  Unsupervised Kernel Dimension Reduction , 2010, NIPS.

[113]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[114]  Anuj Srivastava,et al.  Optimal linear projections for enhancing desired data statistics , 2010, Stat. Comput..

[115]  Christian K. Machens,et al.  Behavioral / Systems / Cognitive Functional , But Not Anatomical , Separation of “ What ” and “ When ” in Prefrontal Cortex , 2009 .

[116]  Barak Blumenfeld An Algorithm for the Analysis of Temporally Structured Multidimensional Measurements , 2010, Front. Comput. Neurosci..

[117]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[118]  Kush R. Varshney,et al.  Linear Dimensionality Reduction for Margin-Based Classification: High-Dimensional Data and Sensor Networks , 2011, IEEE Transactions on Signal Processing.

[119]  Julien Mairal,et al.  Convex optimization with sparsity-inducing norms , 2011 .

[120]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[121]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[122]  Xiyan Hu,et al.  Procrustes problems and associated approximation problems for matrices with k-involutory symmetries , 2011 .

[123]  Wieland Brendel,et al.  Demixed Principal Component Analysis , 2011, NIPS.

[124]  Fernando De la Torre,et al.  A Least-Squares Framework for Component Analysis , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[125]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[126]  Neil D. Lawrence,et al.  A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models , 2010, J. Mach. Learn. Res..

[127]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[128]  Lawrence K. Saul,et al.  Latent Coincidence Analysis: A Hidden Variable Model for Distance Metric Learning , 2012, NIPS.

[129]  Jérôme Malick,et al.  Projection-like Retractions on Matrix Manifolds , 2012, SIAM J. Optim..

[130]  Matthew T. Kaufman,et al.  Neural population dynamics during reaching , 2012, Nature.

[131]  Toshihisa Tanaka,et al.  Empirical Arithmetic Averaging Over the Compact Stiefel Manifold , 2013, IEEE Transactions on Signal Processing.

[132]  Brian Kulis,et al.  Metric Learning: A Survey , 2013, Found. Trends Mach. Learn..

[133]  Alan Julian Izenman Linear Dimensionality Reduction , 2013 .

[134]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[135]  Yi-Hao Kao,et al.  Learning a factor model via regularized PCA , 2011, Machine Learning.

[136]  Bamdev Mishra,et al.  Manopt, a matlab toolbox for optimization on manifolds , 2013, J. Mach. Learn. Res..

[137]  Byron M. Yu,et al.  Dimensionality reduction for large-scale neural recordings , 2014, Nature Neuroscience.

[138]  Laura Schweitzer,et al.  Advances In Kernel Methods Support Vector Learning , 2016 .