Dimension Reduction Techniques for ℓp (1

For Euclidean space (l_2), there exists the powerful dimension reduction transform of Johnson and Lindenstrauss [Conf. in modern analysis and probability, AMS 1984], with a host of known applications. Here, we consider the problem of dimension reduction for all l_p spaces 1<p<2. Although strong lower bounds are known for dimension reduction in l_1, Ostrovsky and Rabani [JACM 2002] successfully circumvented these by presenting an l_1 embedding that maintains fidelity in only a bounded distance range, with applications to clustering and nearest neighbor search. However, their embedding techniques are specific to l_1 and do not naturally extend to other norms. In this paper, we apply a range of advanced techniques and produce bounded range dimension reduction embeddings for all of 1<p<2, thereby demonstrating that the approach initiated by Ostrovsky and Rabani for l_1 can be extended to a much more general framework. We also obtain improved bounds in terms of the intrinsic dimensionality. As a result we achieve improved bounds for proximity problems including snowflake embeddings and clustering.

[1]  Rafail Ostrovsky,et al.  Polynomial time approximation schemes for geometric k-clustering , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[2]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[3]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[4]  Richard Cole,et al.  Searching dynamic point sets in spaces with bounded doubling dimension , 2006, STOC '06.

[5]  James R. Lee,et al.  Extending Lipschitz functions via random metric partitions , 2005 .

[6]  Funda Ergün,et al.  Oblivious string embeddings and edit distance approximations , 2006, SODA '06.

[7]  Rina Panigrahy,et al.  Entropy based nearest neighbor search in high dimensions , 2005, SODA '06.

[8]  Wolfgang Gawronski,et al.  ON THE BELL-SHAPE OF STABLE DENSITIES , 1984 .

[9]  Leonard J. Schulman,et al.  Dimensionality reduction: beyond the Johnson-Lindenstrauss bound , 2011, SODA '11.

[10]  Morteza Zadimoghaddam,et al.  Ordinal Embedding: Approximation Algorithms and Dimensionality Reduction , 2008, APPROX-RANDOM.

[11]  Leonard J. Schulman,et al.  Clustering for edge-cost minimization (extended abstract) , 2000, STOC '00.

[12]  P. Assouad Plongements lipschitziens dans Rn , 2003 .

[13]  Jessica H. Fong,et al.  An Approximate Lp Difference Algorithm for Massive Data Streams , 1999, Discret. Math. Theor. Comput. Sci..

[14]  James R. Lee,et al.  On distance scales, embeddings, and efficient relaxations of the cut cone , 2005, SODA '05.

[15]  Lee-Ad Gottlieb,et al.  A Linear Time Approximation Scheme for Euclidean TSP , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[16]  Dimitris Achlioptas,et al.  Database-friendly random projections , 2001, PODS.

[17]  Ping Li,et al.  Estimators and tail bounds for dimension reduction in lα (0 < α ≤ 2) using stable random projections , 2008, SODA '08.

[18]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[19]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[20]  Sariel Har-Peled,et al.  Fast construction of nets in low dimensional metrics, and their applications , 2004, SCG.

[21]  Rafail Ostrovsky,et al.  Polynomial-time approximation schemes for geometric min-sum median clustering , 2002, JACM.

[22]  Kenneth Ward Church,et al.  Nonlinear Estimators and Tail Bounds for Dimension Reduction in l1 Using Cauchy Random Projections , 2006, J. Mach. Learn. Res..

[23]  Alexandr Andoni,et al.  Hardness of Nearest Neighbor under L-infinity , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[24]  Kunal Talwar,et al.  Ultra-low-dimensional embeddings for doubling metrics , 2008, SODA 2008.

[25]  Nir Ailon,et al.  An almost optimal unrestricted fast Johnson-Lindenstrauss transform , 2010, SODA '11.

[26]  Piotr Indyk,et al.  Approximate clustering via core-sets , 2002, STOC '02.

[27]  J. R. Lee,et al.  Embedding the diamond graph in Lp and dimension reduction in L1 , 2004, math/0407520.

[28]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[29]  Jon M. Kleinberg,et al.  Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[30]  M. Talagrand Embedding Subspaces of L 1 into l N 1 , 1990 .

[31]  Huy L. Nguyen Approximate Nearest Neighbor Search in ℓp , 2013, ArXiv.

[32]  G. Schechtman More on embedding subspaces of $L_p$ in $l^n_r$ , 1987 .

[33]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[34]  A. Naor,et al.  Euclidean quotients of finite metric spaces , 2004, math/0406349.

[35]  V. Zolotarev One-dimensional stable distributions , 1986 .

[36]  David P. Woodruff,et al.  Optimal approximations of the frequency moments of data streams , 2005, STOC '05.

[37]  Ittai Abraham,et al.  Embedding metric spaces in their intrinsic dimension , 2008, SODA '08.

[38]  Adam Krzyżak,et al.  Dimension Reduction Techniques , 2002 .

[39]  Piotr Indyk,et al.  Algorithmic applications of low-distortion geometric embeddings , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[40]  Gideon Schechtman,et al.  Dimension Reduction in L p , 0 < p < 2 , 2011 .

[41]  Assaf Naor,et al.  The Johnson–Lindenstrauss Lemma Almost Characterizes Hilbert Space, But Not Quite , 2008, SODA.

[42]  Piotr Indyk,et al.  Nearest-neighbor-preserving embeddings , 2007, TALG.

[43]  Mahesh Viswanathan,et al.  An Approximate L1-Difference Algorithm for Massive Data Streams , 2002, SIAM J. Comput..

[44]  Makoto Yamazato,et al.  Unimodality of Infinitely Divisible Distribution Functions of Class $L$ , 1978 .

[45]  Rina Panigrahy,et al.  Minimum Enclosing Polytope in High Dimensions , 2004, ArXiv.

[46]  G. Bennett Probability Inequalities for the Sum of Independent Random Variables , 1962 .

[47]  Piotr Indyk,et al.  Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[48]  Moshe Lewenstein,et al.  Fast, precise and dynamic distance queries , 2011, SODA '11.

[49]  V. Milman,et al.  Asymptotic Theory Of Finite Dimensional Normed Spaces , 1986 .

[50]  Noga Alon,et al.  Tracking join and self-join sizes in limited storage , 1999, PODS '99.

[51]  J. Lindenstrauss,et al.  Geometric Nonlinear Functional Analysis , 1999 .

[52]  Peter Hall,et al.  On Unimodality and Rates of Convergence for Stable Laws , 1984 .

[53]  Tomás Feder,et al.  Optimal algorithms for approximate clustering , 1988, STOC '88.

[54]  J. Matousek,et al.  On the distortion required for embedding finite metric spaces into normed spaces , 1996 .

[55]  Suresh Venkatasubramanian,et al.  The Johnson-Lindenstrauss Transform: An Empirical Study , 2011, ALENEX.

[56]  Tyler Neylon,et al.  A locality-sensitive hash for real vectors , 2010, SODA '10.

[57]  M. Talagrand Embedding subspaces of L1 into l1N , 1990 .

[58]  Alexandr Andoni,et al.  Near Linear Lower Bound for Dimension Reduction in L1 , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[59]  Piotr Indyk On approximate nearest neighbors in non-Euclidean spaces , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[60]  Yair Bartal,et al.  Probabilistic approximation of metric spaces and its algorithmic applications , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[61]  Michel Deza,et al.  Geometry of cuts and metrics , 2009, Algorithms and combinatorics.

[62]  Noga Alon,et al.  Ordinal embeddings of minimum relaxation: general properties, trees, and ultrametrics , 2005, SODA '05.

[63]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[64]  Robert Krauthgamer,et al.  Bounded geometries, fractals, and low-distortion embeddings , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[65]  Pankaj K. Agarwal,et al.  Exact and Approximation Algortihms for Clustering , 1997 .

[66]  Anirban Dasgupta,et al.  A sparse Johnson: Lindenstrauss transform , 2010, STOC '10.

[67]  Lee-Ad Gottlieb,et al.  A Nonlinear Approach to Dimension Reduction , 2009, SODA '11.

[68]  Alexandr Andoni,et al.  Nearest neighbor search : the old, the new, and the impossible , 2009 .

[69]  Ken-iti Sato,et al.  On distribution functions of class L , 1978 .

[70]  Bernard Chazelle,et al.  Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.

[71]  Robert Krauthgamer,et al.  Navigating nets: simple algorithms for proximity search , 2004, SODA '04.

[72]  J. L. Nolan Stable Distributions. Models for Heavy Tailed Data , 2001 .