Dimensionality reduction via the Johnson–Lindenstrauss Lemma: theoretical and empirical bounds on embedding dimension

The Johnson–Lindenstrauss (JL) lemma has led to the development of tools for dealing with datasets in high dimensions. The lemma asserts that a set of high-dimensional points can be projected into lower dimensions, while approximately preserving the pairwise distance structure. Significant improvements of the JL lemma since its inception are summarized. Particular focus is placed on reproving Matoušek’s versions of the lemma (Random Struct Algorithms 33(2):142–156, 2008) first using subgaussian projection coefficients and then using sparse projection coefficients. The results of the lemma are illustrated using simulated data. The simulation suggests a projection that is more effective in terms of dimensionality reduction than is borne out by the theory. This more effective projection was applied to a very large natural, rather than simulated, dataset thus further strengthening empirical evidence of the existence of a better than the proven optimal lower bound on the embedding dimension. Additionally, we provide comparisons with other commonly used data reduction and simplification techniques.

[1]  Emma Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[2]  Anupam Gupta,et al.  An elementary proof of the Johnson-Lindenstrauss Lemma , 1999 .

[3]  Magnus Sahlgren,et al.  Navigating the Semantic Horizon using Relative Neighborhood Graphs , 2015, EMNLP.

[4]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[5]  T. S. Jayram,et al.  Simple Analyses of the Sparse Johnson-Lindenstrauss Transform , 2018, SOSA@SODA.

[6]  Kasper Green Larsen,et al.  Optimality of the Johnson-Lindenstrauss Lemma , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[7]  Daniel S. Margulies,et al.  The Neuro Bureau ADHD-200 Preprocessed repository , 2016, NeuroImage.

[8]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[9]  John Fedoruk Dimensionality Reduction via the Johnson and Lindenstrauss Lemma: Mathematical and Computational Improvements , 2016 .

[10]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[11]  Tuan-Anh Nguyen,et al.  Improving the Johnson-Lindenstrauss Lemma , 2010, 1005.1440.

[12]  Natasha Jaques Fast Johnson-Lindenstrauss Transform for Classification of High-Dimensional Data , 2014 .

[13]  Jianzhong Wang Classical Multidimensional Scaling , 2012 .

[14]  Kasper Green Larsen,et al.  The Johnson-Lindenstrauss lemma is optimal for linear dimensionality reduction , 2014, ICALP.

[15]  Byron M. Yu,et al.  Dimensionality reduction for large-scale neural recordings , 2014, Nature Neuroscience.

[16]  J. Matousek,et al.  On variants of the Johnson–Lindenstrauss lemma , 2008 .

[17]  R. Samworth,et al.  Random‐projection ensemble classification , 2015, 1504.04595.

[18]  David L. Donoho,et al.  Aide-Memoire . High-Dimensional Data Analysis : The Curses and Blessings of Dimensionality , 2000 .

[19]  Felix Krahmer,et al.  Optimal fast Johnson–Lindenstrauss embeddings for large data sets , 2017, Sampling Theory, Signal Processing, and Data Analysis.

[20]  Daniel M. Kane,et al.  Sparser Johnson-Lindenstrauss Transforms , 2010, JACM.

[21]  Bernard Chazelle,et al.  Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.

[22]  Shuhong Gao,et al.  Practical Johnson-Lindenstrauss Transforms via Algebraic Geometry Codes , 2017, 2017 International Conference on Control, Artificial Intelligence, Robotics & Optimization (ICCAIRO).

[23]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[24]  Sanjoy Dasgupta,et al.  Experiments with Random Projection , 2000, UAI.

[25]  Peter Frankl,et al.  The Johnson-Lindenstrauss lemma and the sphericity of some graphs , 1987, J. Comb. Theory B.

[26]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[27]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.