Spectral Dimensionality Reduction

In this paper, we study and put under a common framework a number of non-linear dimensionality reduction methods, such as Locally Linear Embedding, Isomap, Laplacian Eigenmaps and kernel PCA, which are based on performing an eigen-decomposition (hence the name 'spectral'). That framework also includes classical methods such as PCA and metric multidimensional scaling (MDS). It also includes the data transformation step used in spectral clustering. We show that in all of these cases the learning algorithm estimates the principal eigenfunctions of an operator that depends on the unknown data density and on a kernel that is not necessarily positive semi-definite. This helps to generalize some of these algorithms so as to predict an embedding for out-of-sample examples without having to retrain the model. It also makes it more transparent what these algorithm are minimizing on the empirical data and gives a corresponding notion of generalization error. Dans cet article, nous etudions et developpons un cadre unifie pour un certain nombre de methodes non lineaires de reduction de dimensionalite, telles que LLE, Isomap, LE (Laplacian Eigenmap) et ACP a noyaux, qui font de la decomposition en valeurs propres (d'ou le nom "spectral""). Ce cadre inclut egalement des methodes classiques telles que l'ACP et l'echelonnage multidimensionnel metrique (MDS). Il inclut aussi l'etape de transformation de donnees utilisee dans l'agregation spectrale. Nous montrons que, dans tous les cas, l'algorithme d'apprentissage estime les fonctions propres principales d'un operateur qui depend de la densite inconnue de donnees et d'un noyau qui n'est pas necessairement positif semi-defini. Ce cadre aide a generaliser certains modeles pour predire les coordonnees des exemples hors-echantillons sans avoir a reentrainer le modele. Il aide egalement a rendre plus transparent ce que ces algorithmes minimisent sur les donnees empiriques et donne une notion correspondante d'erreur de generalisation."

[1]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[2]  J. Gower Adding a point to vector diagrams in multivariate analysis , 1968 .

[3]  E. Kreyszig Introductory Functional Analysis With Applications , 1978 .

[4]  R. Taylor,et al.  The Numerical Treatment of Integral Equations , 1978 .

[5]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[6]  Eric Saund,et al.  Dimensionality-Reduction Using Connectionist Networks , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[8]  Audra E. Kosh,et al.  Linear Algebra and its Applications , 1992 .

[9]  George Karypis,et al.  Introduction to Parallel Computing , 1994 .

[10]  Jonathan Baxter,et al.  Learning internal representations , 1995, COLT '95.

[11]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[12]  Shang-Hua Teng,et al.  Spectral partitioning works: planar graphs and finite element meshes , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[13]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[14]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[15]  Bernhard Schölkopf,et al.  Support vector learning , 1997 .

[16]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[18]  V. Koltchinskii Asymptotics of Spectral Projections of Some Random Matrices Approximating Integral Operators , 1998 .

[19]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[20]  Yair Weiss,et al.  Segmentation using eigenvectors: a unifying view , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[21]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[22]  Christopher K. I. Williams,et al.  The Effect of the Input Density Distribution on Kernel-based Classifiers , 2000, ICML.

[23]  V. Koltchinskii,et al.  Random matrix approximation of spectra of integral operators , 2000 .

[24]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[25]  Christopher K. I. Williams,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[26]  Trevor F. Cox,et al.  Metric multidimensional scaling , 2000 .

[27]  Nello Cristianini,et al.  On the Concentration of Spectral Properties , 2001, NIPS.

[28]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[29]  John Shawe-Taylor,et al.  The Stability of Kernel Principal Components Analysis and its Relation to the Process Eigenspectrum , 2002, NIPS.

[30]  Mikhail Belkin,et al.  Using manifold structure for partially labelled classification , 2002, NIPS 2002.

[31]  Dimitrios Gunopulos,et al.  Non-linear dimensionality reduction techniques for classification and visualization , 2002, KDD.

[32]  Pascal Vincent,et al.  Manifold Parzen Windows , 2002, NIPS.

[33]  Yee Whye Teh,et al.  Automatic Alignment of Local Representations , 2002, NIPS.

[34]  Matthew Brand,et al.  Charting a Manifold , 2002, NIPS.

[35]  Adam Krzyzak,et al.  Piecewise Linear Skeletonization Using Principal Curves , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Balázs Kégl,et al.  Intrinsic Dimension Estimation Using Packing Numbers , 2002, NIPS.

[37]  Joshua B. Tenenbaum,et al.  Global Versus Local Methods in Nonlinear Dimensionality Reduction , 2002, NIPS.

[38]  Yoshua Bengio,et al.  Spectral Clustering and Kernel PCA are Learning Eigenfunctions , 2003 .

[39]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[41]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[42]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[43]  Nikos A. Vlassis,et al.  Non-linear CCA and PCA by Alignment of Local Models , 2003, NIPS.

[44]  D. Donoho,et al.  Hessian Eigenmaps : new locally linear embedding techniques for high-dimensional data , 2003 .

[45]  Gilles Blanchard,et al.  Statistical properties of Kernel Prinicipal Component Analysis , 2019 .

[46]  Bernhard Schölkopf,et al.  A kernel view of the dimensionality reduction of manifolds , 2004, ICML.

[47]  Nicolas Le Roux,et al.  Learning Eigenfunctions Links Spectral Embedding and Kernel PCA , 2004, Neural Computation.

[48]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[49]  Christopher K. I. Williams On a Connection between Kernel PCA and Metric Multidimensional Scaling , 2004, Machine Learning.

[50]  José Carlos Príncipe,et al.  Nonlinear Component Analysis Based on Correntropy , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[51]  Laurent Zwald Statistical properties of kernel principal component analysis , 2006, Machine Learning.

[52]  T. Hastie,et al.  Principal Curves , 2007 .