Near-Optimal Bounds for Binary Embeddings of Arbitrary Sets

We study embedding a subset $K$ of the unit sphere to the Hamming cube $\{-1,+1\}^m$. We characterize the tradeoff between distortion and sample complexity $m$ in terms of the Gaussian width $\omega(K)$ of the set. For subspaces and several structured sets we show that Gaussian maps provide the optimal tradeoff $m\sim \delta^{-2}\omega^2(K)$, in particular for $\delta$ distortion one needs $m\approx\delta^{-2}{d}$ where $d$ is the subspace dimension. For general sets, we provide sharp characterizations which reduces to $m\approx{\delta^{-4}}{\omega^2(K)}$ after simplification. We provide improved results for local embedding of points that are in close proximity of each other which is related to locality sensitive hashing. We also discuss faster binary embedding where one takes advantage of an initial sketching procedure based on Fast Johnson-Lindenstauss Transform. Finally, we list several numerical observations and discuss open problems.

[1]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[2]  Anirban Dasgupta,et al.  A sparse Johnson: Lindenstrauss transform , 2010, STOC '10.

[3]  Benjamin Recht,et al.  Isometric sketching of any set via the Restricted Isometry Property , 2015, ArXiv.

[4]  Constantine Caramanis,et al.  Binary Embedding: Fundamental Limits and Fast Algorithm , 2015, ICML.

[5]  Sjoerd Dirksen,et al.  Toward a unified theory of sparse dimensionality reduction in Euclidean space , 2013, STOC.

[6]  Laurent Jacques,et al.  Small width, low distortions: quasi-isometric embeddings with quantized sub-Gaussian random projections , 2015, ArXiv.

[7]  Shih-Fu Chang,et al.  Circulant Binary Embedding , 2014, ICML.

[8]  Heng Tao Shen,et al.  Hashing for Similarity Search: A Survey , 2014, ArXiv.

[9]  Y. Gordon On Milman's inequality and random subspaces which escape through a mesh in ℝ n , 1988 .

[10]  Michael B. Cohen,et al.  Dimensionality Reduction for k-Means Clustering and Low Rank Approximation , 2014, STOC.

[11]  Y. Plan,et al.  High-dimensional estimation with geometric constraints , 2014, 1404.3749.

[12]  Christos Thrampoulidis,et al.  A Tight Version of the Gaussian min-max theorem in the Presence of Convexity , 2014, ArXiv.

[13]  David L. Donoho,et al.  Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing , 2009, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[14]  Yaniv Plan,et al.  Dimension Reduction by Random Hyperplane Tessellations , 2014, Discret. Comput. Geom..

[15]  Alexander J. Smola,et al.  Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.

[16]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[17]  Richard G. Baraniuk,et al.  1-Bit compressive sensing , 2008, 2008 42nd Annual Conference on Information Sciences and Systems.

[18]  Emmanuel J. Candès,et al.  Tight Oracle Inequalities for Low-Rank Matrix Recovery From a Minimal Number of Noisy Random Measurements , 2011, IEEE Transactions on Information Theory.

[19]  Laurent Jacques,et al.  Robust 1-Bit Compressive Sensing via Binary Stable Embeddings of Sparse Vectors , 2011, IEEE Transactions on Information Theory.

[20]  Michael W. Mahoney,et al.  Revisiting the Nystrom Method for Improved Large-scale Machine Learning , 2013, J. Mach. Learn. Res..

[21]  Joel A. Tropp,et al.  Universality laws for randomized dimension reduction, with applications , 2015, ArXiv.

[22]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[23]  Yaniv Plan,et al.  One-bit compressed sensing with non-Gaussian measurements , 2012, ArXiv.

[24]  M. Rudelson,et al.  On sparse reconstruction from Fourier and Gaussian measurements , 2008 .

[25]  S. Mendelson,et al.  Uniform Uncertainty Principle for Bernoulli and Subgaussian Ensembles , 2006, math/0608665.

[26]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[27]  Christos Thrampoulidis,et al.  LASSO with Non-linear Measurements is Equivalent to One With Linear Measurements , 2015, NIPS.

[28]  Yaniv Plan,et al.  Robust 1-bit Compressed Sensing and Sparse Logistic Regression: A Convex Programming Approach , 2012, IEEE Transactions on Information Theory.

[29]  David P. Woodruff Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..