Dimension Reduction: A Guided Tour

We give a tutorial overview of several geometric methods for dimension reduction. We divide the methods into projective methods and methods that model the manifold on which the data lies. For projective methods, we review projection pursuit, principal component analysis (PCA), kernel PCA, probabilistic PCA, canonical correlation analysis, oriented PCA, and several techniques for sufficient dimension reduction. For the manifold methods, we review multidimensional scaling (MDS), landmark MDS, Isomap, locally linear embedding, Laplacian

[1]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[2]  G. Grimmett,et al.  Probability and random processes , 2002 .

[3]  C. Berg,et al.  Harmonic Analysis on Semigroups , 1984 .

[4]  R. F.,et al.  Mathematical Statistics , 1944, Nature.

[5]  Naftali Tishby,et al.  Sufficient Dimensionality Reduction , 2003, J. Mach. Learn. Res..

[6]  R. Cook,et al.  Dimension Reduction in Binary Response Regression , 1999 .

[7]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[8]  A. Rahimi,et al.  Clustering with Normalized Cuts is Clustering with a Hyperplane , 2004 .

[9]  John Platt,et al.  FastMap, MetricMap, and Landmark MDS are all Nystrom Algorithms , 2005, AISTATS.

[10]  Golub Gene H. Et.Al Matrix Computations, 3rd Edition , 2007 .

[11]  Gunnar Rätsch,et al.  Kernel PCA and De-Noising in Feature Spaces , 1998, NIPS.

[12]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[13]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[14]  Thomas G. Dietterich,et al.  Editors. Advances in Neural Information Processing Systems , 2002 .

[15]  R. Darlington,et al.  Factor Analysis , 2008 .

[16]  D. Freedman,et al.  Asymptotics of Graphical Projection Pursuit , 1984 .

[17]  E. Oja,et al.  Independent Component Analysis , 2013 .

[18]  William H. Press,et al.  Book-Review - Numerical Recipes in Pascal - the Art of Scientific Computing , 1989 .

[19]  T. Hsing,et al.  An RKHS formulation of the inverse regression dimension-reduction problem , 2009, 0904.0076.

[20]  R. Cook,et al.  Likelihood-Based Sufficient Dimension Reduction , 2009 .

[21]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[22]  Michel Minoux,et al.  Graphs and Algorithms , 1984 .

[23]  Bernhard Schölkopf,et al.  The Kernel Trick for Distances , 2000, NIPS.

[24]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[25]  Christopher J. C. Burges,et al.  Geometric Methods for Feature Extraction and Dimensional Reduction , 2005 .

[26]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[27]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[28]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[29]  Erol A. Peköz,et al.  A Second Course in Probability , 2007 .

[30]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[31]  Jianbo Shi,et al.  Learning Segmentation by Random Walks , 2000, NIPS.

[32]  採編典藏組 Society for Industrial and Applied Mathematics(SIAM) , 2008 .

[33]  Christopher J. C. Burges,et al.  Some Notes on Applied Mathematics for Machine Learning , 2003, Advanced Lectures on Machine Learning.

[34]  Laurens van der Maaten,et al.  Learning a Parametric Embedding by Preserving Local Structure , 2009, AISTATS.

[35]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[36]  R. Cook Regression Graphics , 1994 .

[37]  Ali Esmaili,et al.  Probability and Random Processes , 2005, Technometrics.

[38]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[39]  H. Zha,et al.  Contour regression: A general approach to dimension reduction , 2005, math/0508277.

[40]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[41]  R. Pintner,et al.  Crossroads in the Mind of Man: A Study of Differentiable Mental Abilities. , 1929 .

[42]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[43]  Multiway Spectral Clustering: A Margin-based Perspective , 2008, 1102.3768.

[44]  I. J. Schoenberg Remarks to Maurice Frechet's Article ``Sur La Definition Axiomatique D'Une Classe D'Espace Distances Vectoriellement Applicable Sur L'Espace De Hilbert , 1935 .

[45]  Ker-Chau Li,et al.  On Principal Hessian Directions for Data Visualization and Dimension Reduction: Another Application of Stein's Lemma , 1992 .

[46]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[47]  Stephen P. Boyd,et al.  The Fastest Mixing Markov Process on a Graph and a Connection to a Maximum Variance Unfolding Problem , 2006, SIAM Rev..

[48]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[49]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[50]  Wm. R. Wright General Intelligence, Objectively Determined and Measured. , 1905 .

[51]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[52]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[53]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[54]  Bernhard Schölkopf,et al.  A kernel view of the dimensionality reduction of manifolds , 2004, ICML.

[55]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[56]  Christopher M. Bishop,et al.  Bayesian PCA , 1998, NIPS.

[57]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[58]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[59]  A. Tsybakov,et al.  Sliced Inverse Regression for Dimension Reduction - Comment , 1991 .

[60]  J. Friedman,et al.  PROJECTION PURSUIT DENSITY ESTIMATION , 1984 .

[61]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[63]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[64]  Joshua B. Tenenbaum,et al.  Mapping a Manifold of Perceptual Observations , 1997, NIPS.

[65]  Kurt Hornik,et al.  Learning in linear neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[66]  Joshua B. Tenenbaum,et al.  Global Versus Local Methods in Nonlinear Dimensionality Reduction , 2002, NIPS.

[67]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[68]  John C. Platt,et al.  Extracting noise-robust features from audio data , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[69]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[70]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[71]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[72]  John C. Platt,et al.  Distortion discriminant analysis for audio fingerprinting , 2003, IEEE Trans. Speech Audio Process..

[73]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[74]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[75]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[76]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[77]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[78]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[79]  Kilian Q. Weinberger,et al.  An Introduction to Nonlinear Dimensionality Reduction by Maximum Variance Unfolding , 2006, AAAI.

[80]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[81]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[82]  Michael I. Jordan,et al.  Kernel dimension reduction in regression , 2009, 0908.1854.

[83]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[84]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.

[85]  Shotaro Akaho,et al.  A kernel method for canonical correlation analysis , 2006, ArXiv.

[86]  U. M. Feyyad Data mining and knowledge discovery: making sense out of data , 1996 .

[87]  Alexander Basilevsky,et al.  Statistical Factor Analysis and Related Methods , 1994 .

[88]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[89]  Ker-Chau Li,et al.  Rejoinder to "Sliced inverse regression for dimension reduction" , 1991 .

[90]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[91]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[92]  N. Higham COMPUTING A NEAREST SYMMETRIC POSITIVE SEMIDEFINITE MATRIX , 1988 .

[93]  T. Hastie,et al.  Principal Curves , 2007 .

[94]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[95]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[96]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[97]  P. Grassberger,et al.  Measuring the Strangeness of Strange Attractors , 1983 .

[98]  Christopher K. I. Williams On a Connection between Kernel PCA and Metric Multidimensional Scaling , 2004, Machine Learning.

[99]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[100]  Michael I. Jordan,et al.  Regression on manifolds using kernel dimension reduction , 2007, ICML '07.

[101]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[102]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[103]  Sheldon M. Ross,et al.  Introduction to probability models , 1975 .

[104]  Kilian Q. Weinberger,et al.  Spectral Methods for Dimensionality Reduction , 2006, Semi-Supervised Learning.

[105]  Gene H. Golub,et al.  Matrix computations , 1983 .

[106]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[107]  S. Weisberg,et al.  Comments on "Sliced inverse regression for dimension reduction" by K. C. Li , 1991 .