Nonlinear Estimators and Tail Bounds for Dimension Reduction in l1 Using Cauchy Random Projections

The Johnson--Lindenstrauss lemma shows that any <i>n</i> points in Euclidean space (i.e., ℝ<sup><i>n</i></sup> with distances measured under the ℓ<inf>2</inf> norm) may be mapped down to <i>O</i>((log <i>n</i>)/ε<sup>2</sup>) dimensions such that no pairwise distance is distorted by more than a (1 + ε) factor. Determining whether such dimension reduction is possible in ℓ<inf>1</inf> has been an intriguing open question. We show strong lower bounds for general dimension reduction in ℓ<inf>1</inf>. We give an explicit family of <i>n</i> points in ℓ<inf>1</inf> such that any embedding with constant distortion <i>D</i> requires <i>n</i><sup>Ω(1/<i>D</i><sup>2</sup>)</sup> dimensions. This proves that there is no analog of the Johnson--Lindenstrauss lemma for ℓ<inf>1</inf>; in fact, embedding with any constant distortion requires <i>n</i><sup>Ω(1)</sup> dimensions. Further, embedding the points into ℓ<inf>1</inf> with (1+ε) distortion requires <i>n</i><sup>½−<i>O</i>(ε log(1/ε))</sup> dimensions. Our proof establishes this lower bound for shortest path metrics of series-parallel graphs. We make extensive use of linear programming and duality in devising our bounds. We expect that the tools and techniques we develop will be useful for future investigations of embeddings into ℓ<inf>1</inf>.

[1]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[2]  Sanjeev Arora,et al.  Learning mixtures of arbitrary gaussians , 2001, STOC '01.

[3]  G. Schechtman More on embedding subspaces of $L_p$ in $l^n_r$ , 1987 .

[4]  V. Zolotarev One-dimensional stable distributions , 1986 .

[5]  Ping Li,et al.  Using Sketches to Estimate Two-way and Multi-way Associations , 2005 .

[6]  Ilan Newman,et al.  A Lower Bound on the Distortion of Embedding Planar Metrics into Euclidean Space , 2002, SCG '02.

[7]  Peter Frankl,et al.  The Johnson-Lindenstrauss lemma and the sphericity of some graphs , 1987, J. Comb. Theory B.

[8]  Kenneth Ward Church,et al.  A Sketch Algorithm for Estimating Two-Way and Multi-Way Associations , 2007, CL.

[9]  Amit Sahai,et al.  Dimension reduction in the /spl lscr//sub 1/ norm , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[10]  J. Lawless,et al.  Conditional confidence interval procedures for the location and scale parameters of the Cauchy and logistic distributions , 1972 .

[11]  Santosh S. Vempala,et al.  An algorithmic theory of learning: Robust concepts and random projection , 1999, Machine Learning.

[12]  J. Bourgain On lipschitz embedding of finite metric spaces in Hilbert space , 1985 .

[13]  John M. Cioffi,et al.  On the distribution of SINR for the MMSE MIMO receiver and performance analysis , 2006, IEEE Transactions on Information Theory.

[14]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[15]  M. Tweedie Statistical Properties of Inverse Gaussian Distributions. II , 1957 .

[16]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[17]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[18]  Kenneth Ward Church,et al.  Very sparse random projections , 2006, KDD '06.

[19]  Piotr Indyk,et al.  Comparing Data Streams Using Hamming Norms (How to Zero In) , 2002, VLDB.

[20]  Kenneth Ward Church,et al.  Conditional Random Sampling: A Sketch-based Sampling Technique for Sparse Data , 2006, NIPS.

[21]  G. Cordeiro,et al.  Second and Third Order Bias Reduction for One-Parameter Family Models , 1996 .

[22]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[23]  E. Fama,et al.  Some Properties of Symmetric Stable Distributions , 1968 .

[24]  M. Bartlett,et al.  APPROXIMATE CONFIDENCE INTERVALS , 1953 .

[25]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[26]  Mahesh Viswanathan,et al.  An Approximate L1-Difference Algorithm for Massive Data Streams , 2002, SIAM J. Comput..

[27]  J. Matousek,et al.  On the distortion required for embedding finite metric spaces into normed spaces , 1996 .

[28]  R. F.,et al.  Mathematical Statistics , 1944, Nature.

[29]  M. Talagrand Embedding subspaces of L1 into l1N , 1990 .

[30]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[31]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[32]  Dan Suciu,et al.  Journal of the ACM , 2006 .

[33]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[34]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[35]  Francisco José de A. Cysneiros,et al.  Skewness and Kurtosis for Maximum Likelihood Estimator in one-parameter Exponential Family Models , 2002 .

[36]  L. J. Bain,et al.  A Property of Maximum Likelihood Estimators of Location and Scale Parameters , 1969 .

[37]  Piotr Indyk,et al.  Stable distributions, pseudorandom generators, embeddings, and data stream computation , 2006, JACM.

[38]  H. Gerber From the generalized gamma to the generalized negative binomial distribution , 1992 .

[39]  T. Severini Likelihood Methods in Statistics , 2001 .

[40]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[41]  R. Fisher Two New Properties of Mathematical Likelihood , 1934 .

[42]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[43]  CharikarMoses,et al.  On the impossibility of dimension reduction in l1 , 2005 .

[44]  Joydeep Ghosh,et al.  A S alable Approa h to Balan ed, High-dimensional Clustering of Market-baskets , 2000 .

[45]  E. Fama,et al.  Parameter Estimates for Symmetric Stable Distributions , 1971 .

[46]  Kenneth Ward Church,et al.  Improving Random Projections Using Marginal Information , 2006, COLT.

[47]  Ping Li,et al.  Very sparse stable random projections for dimension reduction in lα (0 <α ≤ 2) norm , 2007, KDD '07.

[48]  Piotr Indyk,et al.  Nearest-neighbor-preserving embeddings , 2007, TALG.

[49]  Anupam Gupta,et al.  An elementary proof of the Johnson-Lindenstrauss Lemma , 1999 .

[50]  J. Leroy Folks,et al.  The Inverse Gaussian Distribution: Theory: Methodology, and Applications , 1988 .

[51]  Frank E. Grubbs,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[52]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[53]  Noga Alon,et al.  Problems and results in extremal combinatorics--I , 2003, Discret. Math..

[54]  Michel Deza,et al.  Geometry of cuts and metrics , 2009, Algorithms and combinatorics.

[55]  Philip S. Yu,et al.  A new method for similarity indexing of market basket data , 1999, SIGMOD '99.

[56]  Kenneth Ward Church,et al.  Nonlinear Estimators and Tail Bounds for Dimension Reduction in l1 Using Cauchy Random Projections , 2006, J. Mach. Learn. Res..

[57]  Anupam Gupta,et al.  Cuts, Trees and ℓ1-Embeddings of Graphs* , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[58]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[59]  V. Barnett Evaluation of the maximum-likelihood estimator where the likelihood equation has multiple roots. , 1966, Biometrika.

[60]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[61]  Sanjoy Dasgupta,et al.  An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[62]  L. Shenton,et al.  Higher Moments of a Maximum‐Likelihood Estimate , 1963 .

[63]  J. McCulloch,et al.  Simple consistent estimators of stable distribution parameters , 1986 .

[64]  T. Philips,et al.  The Moment Bound is Tighter than Chernoff's Bound for Positive Tail Probabilities , 1995 .

[65]  J. Arias-de-Reyna,et al.  Finite metric spaces needing high dimension for lipschitz embeddings in banach spaces , 1992 .

[66]  Santosh S. Vempala,et al.  The Random Projection Method , 2005, DIMACS Series in Discrete Mathematics and Theoretical Computer Science.

[67]  Bernard Chazelle,et al.  Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.

[68]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[69]  Trevor Darrell,et al.  Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing) , 2006 .

[70]  Kenneth Ward Church,et al.  Using Sketches to Estimate Associations , 2005, HLT.

[71]  Dimitris Achlioptas,et al.  Database-friendly random projections , 2001, PODS.

[72]  D. Hinkley Likelihood inference about location and scale parameters , 1978 .

[73]  Satish Rao,et al.  Small distortion and volume preserving embeddings for planar and Euclidean metrics , 1999, SCG '99.

[74]  Keith Ball Isometric Embedding in lp-spaces , 1990, Eur. J. Comb..

[75]  Graham Cormode,et al.  Estimating Dominance Norms of Multiple Data Streams , 2003, ESA.

[76]  Piotr Indyk,et al.  Algorithmic applications of low-distortion geometric embeddings , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[77]  Prabhakar Raghavan,et al.  Computing on data streams , 1999, External Memory Algorithms.

[78]  J. R. Lee,et al.  Embedding the diamond graph in Lp and dimension reduction in L1 , 2004, math/0407520.

[79]  M. Talagrand Embedding Subspaces of L 1 into l N 1 , 1990 .

[80]  P. Hougaard Survival models for heterogeneous populations derived from stable distributions , 1986 .

[81]  Rafail Ostrovsky,et al.  Polynomial time approximation schemes for geometric k-clustering , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[82]  J. Ghosh,et al.  ON THE VALIDITY OF THE FORMAL EDGEWORTH EXPANSION , 1978 .

[83]  Nathan Linial Finite metric spaces: combinatorics, geometry and algorithms , 2002, SCG '02.

[84]  Anupam Gupta,et al.  Cuts, Trees and ℓ1-Embeddings of Graphs* , 2004, Comb..

[85]  Marin Ferecatu,et al.  Retrieval of difficult image classes using svd-based relevance feedback , 2004, MIR '04.

[86]  Piotr Indyk Dimensionality reduction techniques for proximity problems , 2000, SODA '00.

[87]  G. Lugosi,et al.  On Concentration-of-Measure Inequalities , 1998 .

[88]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .