Dimensionality reduction: beyond the Johnson-Lindenstrauss bound

Dimension reduction of metric data has become a useful technique with numerous applications. The celebrated Johnson-Lindenstrauss lemma states that any n-point subset of Euclidean space can be embedded in O(ε−2 log n)-dimension with (1 + ε)-distortion. This bound is known to be nearly tight. In many applications the demand that all distances should be nearly preserved is too strong. In this paper we show that indeed under natural relaxations of the goal of the embedding, an improved dimension reduction is possible where the target dimension is independent of n. Our main result can be viewed as a local dimension reduction. There are a variety of empirical situations in which small distances are meaningful and reliable, but larger ones are not. Such situations arise in source coding, image processing, computational biology, and other applications, and are the motivation for widely-used heuristics such as Isomap and Locally Linear Embedding. Pursuing a line of work begun by Whitney, Nash showed that every C1 manifold of dimension d can be embedded in R2d+2 in such a manner that the local structure at each point is preserved isometrically. Our work is an analog of Nash's for discrete subsets of Euclidean space. For perfect preservation of infinitesimal neighborhoods we substitute near-isometric embedding of neighborhoods of bounded cardinality. We show that any finite subset of Euclidean space can be embedded in O(ε−2 log k)-dimension while preserving with (1 + ε)-distortion the distances within a "core neighborhood" of each point. (The core neighborhood is a metric ball around the point, whose radius is a substantial fraction of the radius of the ball of cardinality k, the k-neighborhood.) When the metric space satisfies a weak growth rate property, the guarantee applies to the entire k-neighborhood (with some dependency of the embedding dimension on the growth rate). We also show how to obtain a global embedding that also keeps distant points well-separated (at the cost of dependency on the doubling dimension of the space). As an application of our methods we obtain an (Assouad-style) dimension reduction for finite subsets of Euclidean space where the metric is raised to some fractional power (the resulting metrics are known as snowflakes). We show that any such metric X can be embedded in dimension Õ(ε−3 dim(X)) with 1 + ε distortion, where dim(X) is the doubling dimension, a measure of the intrinsic dimension of the set. This result improves recent work by Gottlieb and Krauthgamer [20] to a nearly tight bound. The new dimension reduction results are useful for applications such as clustering and distance labeling.

[1]  I. J. Schoenberg,et al.  Metric spaces and positive definite functions , 1938 .

[2]  H. Whitney The Self-Intersections of a Smooth n-Manifold in 2n-Space , 1944 .

[3]  J. Nash C 1 Isometric Imbeddings , 1954 .

[4]  Y. Wong,et al.  Differentiable Manifolds , 2009 .

[5]  J. Bourgain On lipschitz embedding of finite metric spaces in Hilbert space , 1985 .

[6]  Y. Gordon On Milman's inequality and random subspaces which escape through a mesh in ℝ n , 1988 .

[7]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[8]  Songwei Qian ɛ-Isometric Embeddings , 1995 .

[9]  Yair Bartal,et al.  Probabilistic approximation of metric spaces and its algorithmic applications , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[10]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Satish Rao,et al.  Small distortion and volume preserving embeddings for planar and Euclidean metrics , 1999, SCG '99.

[12]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[13]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[14]  Mikkel Thorup,et al.  Approximate distance oracles , 2001, JACM.

[15]  U. Lang,et al.  Bilipschitz Embeddings of Metric Spaces into Space Forms , 2001 .

[16]  Mukund Balasubramanian,et al.  The isomap algorithm and topological stability. , 2002, Science.

[17]  Matthew Brand,et al.  Charting a Manifold , 2002, NIPS.

[18]  Maja J. Mataric,et al.  Deriving action and behavior primitives from human motion data , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Robert Krauthgamer,et al.  Bounded geometries, fractals, and low-distortion embeddings , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[21]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[22]  Robert Krauthgamer,et al.  The intrinsic dimensionality of graphs , 2003, STOC '03.

[23]  D. Donoho,et al.  Hessian Eigenmaps : new locally linear embedding techniques for high-dimensional data , 2003 .

[24]  P. Assouad Plongements lipschitziens dans Rn , 2003 .

[25]  Sanjoy Dasgupta,et al.  An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[26]  A. Naor,et al.  Euclidean quotients of finite metric spaces , 2004, math/0406349.

[27]  Kilian Q. Weinberger,et al.  Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[28]  Yair Bartal,et al.  Dimension reduction for ultrametrics , 2004, SODA '04.

[29]  Jens Nilsson,et al.  Approximate geodesic distances reveal biologically relevant structures in microarray data , 2004, Bioinform..

[30]  James R. Lee,et al.  On distance scales, embeddings, and efficient relaxations of the cut cone , 2005, SODA '05.

[31]  Yuxiao Hu,et al.  Face recognition using Laplacianfaces , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  M. Talagrand The Generic Chaining , 2005 .

[33]  S. Mendelson,et al.  Empirical processes and random projections , 2005 .

[34]  Lydia E Kavraki,et al.  Low-dimensional, free-energy landscapes of protein-folding reactions by nonlinear dimensionality reduction , 2006, Proc. Natl. Acad. Sci. USA.

[35]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[36]  Ittai Abraham,et al.  Advances in metric embedding theory , 2006, STOC '06.

[37]  Ittai Abraham,et al.  Local embeddings of metric spaces , 2007, STOC '07.

[38]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[39]  B. Recht,et al.  A Nash-type Dimensionality Reduction for Discrete Subsets of L 2 , 2008 .

[40]  Ittai Abraham,et al.  Embedding metric spaces in their intrinsic dimension , 2008, SODA '08.

[41]  Richard G. Baraniuk,et al.  Random Projections of Smooth Manifolds , 2009, Found. Comput. Math..

[42]  Gideon Schechtman,et al.  Lower Bounds for Local Versions of Dimension Reductions , 2009, Discret. Comput. Geom..

[43]  Ittai Abraham,et al.  On low dimensional local embeddings , 2009, SODA.

[44]  Noga Alon,et al.  Perturbed Identity Matrices Have High Rank: Proof and Applications , 2009, Combinatorics, Probability and Computing.

[45]  Gábor Tardos,et al.  A constructive proof of the general lovász local lemma , 2009, JACM.

[46]  Lee-Ad Gottlieb,et al.  A Nonlinear Approach to Dimension Reduction , 2009, SODA '11.

[47]  P. Erdos-L Lovász Problems and Results on 3-chromatic Hypergraphs and Some Related Questions , 2022 .