Overcoming the l1 non-embeddability barrier: algorithms for product metrics

A common approach for solving computational problems over a difficult metric space is to embed the "hard" metric into L1 which admits efficient algorithms and is thus considered an "easy" metric. This approach has proved successful or partially successful for important spaces such as the edit distance, but it also has inherent limitations: it is provably impossible to go below certain approximation for some metrics. We propose a new approach, of embedding the difficult space into richer host spaces, namely iterated products of standard spaces like l1 and l∞. We show that this class is rich since it contains useful metric spaces with only a constant distortion, and, at the same time, it is tractable and admits efficient algorithms. Using this approach, we obtain for example the first nearest neighbor data structure with O(log log d) approximation for edit distance in non-repetitive strings (the Ulam metric). This approximation is exponentially better than the lower bound for embedding into L1. Furthermore, we give constant factor approximation for two other computational problems. Along the way, we answer positively a question posed in [Ajtai, Jayram, Kumar, and Sivakumar, STOC 2002]. One of our algorithms has already found applications for smoothed edit distance over 0--1 strings [Andoni and Krauthgamer, ICALP 2008].

[1]  Graham Cormode,et al.  Space efficient mining of multigraph streams , 2005, PODS.

[2]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[3]  Sariel Har-Peled A replacement for Voronoi diagrams of near linear size , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[4]  James R. Lee,et al.  Euclidean distortion and the sparsest cut , 2005, STOC '05.

[5]  Nisheeth K. Vishnoi,et al.  Integrality gaps for sparsest cut and minimum linear arrangement problems , 2006, STOC '06.

[6]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[7]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[8]  Satish Rao,et al.  Expander flows, geometric embeddings and graph partitioning , 2004, STOC '04.

[9]  Nathan Linial Finite metric spaces: combinatorics, geometry and algorithms , 2002, SCG '02.

[10]  Alexandr Andoni,et al.  Block Heavy Hitters , 2008 .

[11]  J. Lindenstrauss,et al.  Handbook of geometry of Banach spaces , 2001 .

[12]  Y. Rabani,et al.  Improved lower bounds for embeddings into L 1 , 2006, SODA 2006.

[13]  Alexandr Andoni,et al.  Earth mover distance over high-dimensional spaces , 2008, SODA '08.

[14]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[15]  Bernard Chazelle,et al.  Estimating the distance to a monotone function , 2007, Random Struct. Algorithms.

[16]  Robert Krauthgamer,et al.  Estimating the sortedness of a data stream , 2007, SODA '07.

[17]  J. Marden Analyzing and Modeling Rank Data , 1996 .

[18]  Piotr Indyk,et al.  Algorithmic applications of low-distortion geometric embeddings , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[19]  Piotr Indyk,et al.  Approximate Nearest Neighbor under edit distance via product metrics , 2004, SODA '04.

[20]  Jiri Matousek,et al.  Lectures on discrete geometry , 2002, Graduate texts in mathematics.

[21]  Ronitt Rubinfeld,et al.  A sublinear algorithm for weakly approximating edit distance , 2003, STOC '03.

[22]  Subhash Khot,et al.  Nonembeddability theorems via Fourier analysis , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[23]  Piotr Indyk,et al.  A near linear time constant factor approximation for Euclidean bichromatic matching (cost) , 2007, SODA '07.

[24]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[25]  Yuval Rabani,et al.  Improved lower bounds for embeddings into L1 , 2006, SODA '06.

[26]  Gideon Schechtman,et al.  Planar Earthmover is not in L_1 , 2005, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[27]  Alexandr Andoni Approximate nearest neighbor problem in high dimensions , 2005 .

[28]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[29]  Robert Krauthgamer,et al.  Embedding the Ulam metric into l1 , 2006, Theory Comput..

[30]  Alexandr Andoni,et al.  Efficient algorithms for substring near neighbor problem , 2006, SODA '06.

[31]  Nisheeth K. Vishnoi,et al.  The Unique Games Conjecture, Integrality Gap for Cut Problems and Embeddability of Negative Type Metrics into l1 , 2005, FOCS.

[32]  Alexandr Andoni,et al.  The Computational Hardness of Estimating Edit Distance , 2010 .

[33]  David P. Woodruff,et al.  Optimal approximations of the frequency moments of data streams , 2005, STOC '05.

[34]  Piotr Indyk,et al.  Approximate nearest neighbor algorithms for Frechet distance via product metrics , 2002, SCG '02.

[35]  Yuval Rabani,et al.  An O(log k) Approximate Min-Cut Max-Flow Theorem and Approximation Algorithm , 1998, SIAM J. Comput..

[36]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[37]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[38]  Ravi Kumar,et al.  Approximate counting of inversions in a data stream , 2002, STOC '02.

[39]  Ronitt Rubinfeld,et al.  Spot-checkers , 1998, STOC '98.