Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold

The problem of dimensionality reduction arises in many fields of information processing, including machine learning, data compression, scientific visualization, pattern recognition, and neural computation. Here we describe locally linear embedding (LLE), an unsupervised learning algorithm that computes low dimensional, neighborhood preserving embeddings of high dimensional data. The data, assumed to be sampled from an underlying manifold, are mapped into a single global coordinate system of lower dimensionality. The mapping is derived from the symmetries of locally linear reconstructions, and the actual computation of the embedding reduces to a sparse eigenvalue problem. Notably, the optimizations in LLE---though capable of generating highly nonlinear embeddings---are simple to implement, and they do not involve local minima. In this paper, we describe the implementation of the algorithm in detail and discuss several extensions that enhance its performance. We present results of the algorithm applied to data sampled from known manifolds, as well as to collections of images of faces, lips, and handwritten digits. These examples are used to provide extensive illustrations of the algorithm's performance---both successes and failures---and to relate the algorithm to previous and ongoing work in nonlinear dimensionality reduction.

[1]  G. Judge,et al.  Inequality Restrictions in Regression Analysis , 1966 .

[2]  Keinosuke Fukunaga,et al.  An Algorithm for Finding Intrinsic Dimensionality of Data , 1971, IEEE Transactions on Computers.

[3]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[4]  Forrest W. Young,et al.  Nonmetric individual differences multidimensional scaling: An alternating least squares method with optimal scaling features , 1977 .

[5]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  Anil K. Jain,et al.  An Intrinsic Dimensionality Estimator from Near-Neighbor Information , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[9]  Robert E. Tarjan,et al.  Data structures and network algorithms , 1983, CBMS-NSF regional conference series in applied mathematics.

[10]  IItevor Hattie Principal Curves and Surfaces , 1984 .

[11]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[12]  Richard Durbin,et al.  An analogue approach to the travelling salesman problem using an elastic net method , 1987, Nature.

[13]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[14]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[15]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[16]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[17]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[18]  Stephen M. Omohundro,et al.  Bumptrees for Efficient Function, Constraint and Classification Learning , 1990, NIPS.

[19]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[20]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[21]  Michael L. Littman,et al.  Visualizing the embedding of objects in Euclidean space , 1992 .

[22]  Garrison W. Cottrell,et al.  Non-Linear Dimensionality Reduction , 1992, NIPS.

[23]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[24]  Thomas Martinetz,et al.  Topology representing networks , 1994, Neural Networks.

[25]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Stephen M. Omohundro,et al.  Nonlinear Image Interpolation using Manifold Learning , 1994, NIPS.

[27]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[28]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[29]  Tomaso Poggio,et al.  Image Representations for Visual Learning , 1996, Science.

[30]  Geoffrey E. Hinton,et al.  Modeling the manifolds of images of handwritten digits , 1997, IEEE Trans. Neural Networks.

[31]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Geoffrey E. Hinton,et al.  Using Mixtures of Factor Analyzers for Segmentation and Pose Estimation , 1997 .

[33]  Joshua B. Tenenbaum,et al.  Mapping a Manifold of Perceptual Observations , 1997, NIPS.

[34]  Nanda Kambhatla,et al.  Dimension Reduction by Local Principal Component Analysis , 1997, Neural Computation.

[35]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[36]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[37]  Hans Peter Graf,et al.  Sample-based synthesis of photo-realistic talking heads , 1998, Proceedings Computer Animation '98 (Cat. No.98EX169).

[38]  Aapo Hyvärinen,et al.  Independent component analysis in the presence of Gaussian noise by maximizing joint likelihood , 1998, Neurocomputing.

[39]  Gerard L. G. Sleijpen,et al.  Jacobi-Davidson Style QR and QZ Algorithms for the Reduction of Matrix Pencils , 1998, SIAM J. Sci. Comput..

[40]  Ayoub Ghriss,et al.  Mixtures of Probabilistic Principal Component Analysers , 2018 .

[41]  Hagai Attias,et al.  Independent Factor Analysis , 1999, Neural Computation.

[42]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[43]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[44]  Yair Weiss,et al.  Segmentation using eigenvectors: a unifying view , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[45]  Jianbo Shi,et al.  Learning Segmentation by Random Walks , 2000, NIPS.

[46]  Jack Dongarra,et al.  Templates for the Solution of Algebraic Eigenvalue Problems , 2000, Software, environments, tools.

[47]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[48]  Lawrence K. Saul,et al.  Periodic Component Analysis: An Eigenvalue Method for Representing Periodic Structure in Speech , 2000, NIPS.

[49]  Piotr Indyk Dimensionality reduction techniques for proximity problems , 2000, SODA '00.

[50]  Andrew W. Moore,et al.  'N-Body' Problems in Statistical Learning , 2000, NIPS.

[51]  Joachim M. Buhmann,et al.  Data visualization by multidimensional scaling: a deterministic annealing approach , 1996, Pattern Recognit..

[52]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[53]  Lawrence K. Saul,et al.  Maximum likelihood and minimum classification error factor analysis for automatic speech recognition , 2000, IEEE Trans. Speech Audio Process..

[54]  H. Sebastian Seung,et al.  The Manifold Ways of Perception , 2000, Science.

[55]  Robert Pless Embedding Images in non-Flat Spaces , 2001 .

[56]  D. DeCoste Visualizing Mercer Kernel feature spaces via kernelized locally-linear embeddings , 2001 .

[57]  L. Wasserman,et al.  Fast Algorithms and Efficient Statistics: N-Point Correlation Functions , 2000, astro-ph/0012333.

[58]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[59]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[60]  Pietro Perona,et al.  Grouping and dimensionality reduction by locally linear embedding , 2001, NIPS.

[61]  Jianbo Shi,et al.  Grouping with Directed Relationships , 2001, EMMCVPR.

[62]  Ben J. A. Kröse,et al.  A k-segments algorithm for finding principal curves , 2002, Pattern Recognit. Lett..

[63]  David R. Karger,et al.  Finding nearest neighbors in growth-restricted metrics , 2002, STOC '02.

[64]  Ben J. A. Kröse,et al.  Supervised Dimension Reduction of Intrinsically Low-Dimensional Data , 2002, Neural Computation.

[65]  Matthew Brand,et al.  Charting a Manifold , 2002, NIPS.

[66]  Balázs Kégl,et al.  Intrinsic Dimension Estimation Using Packing Numbers , 2002, NIPS.

[67]  Yoshua Bengio,et al.  Learning Eigenfunctions of Similarity: Linking Spectral Clustering and Kernel PCA , 2003 .

[68]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[69]  Vin de Silva,et al.  Unsupervised Learning of Curved Manifolds , 2003 .

[70]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[71]  Daniel D. Lee,et al.  Learning High Dimensional Correspondences from Low Dimensional Manifolds , 2003 .

[72]  Kun Huang,et al.  A unifying theorem for spectral embedding and clustering , 2003, AISTATS.

[73]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .

[74]  Christopher K. I. Williams On a Connection between Kernel PCA and Metric Multidimensional Scaling , 2004, Machine Learning.