Improved Approximation Algorithms for Earth-Mover Distance in Data Streams

For two multisets $S$ and $T$ of points in $[\Delta]^2$, such that $|S| = |T|= n$, the earth-mover distance (EMD) between $S$ and $T$ is the minimum cost of a perfect bipartite matching with edges between points in $S$ and $T$, i.e., $EMD(S,T) = \min_{\pi:S\rightarrow T}\sum_{a\in S}||a-\pi(a)||_1$, where $\pi$ ranges over all one-to-one mappings. The sketching complexity of approximating earth-mover distance in the two-dimensional grid is mentioned as one of the open problems in the literature. We give two algorithms for computing EMD between two multi-sets when the number of distinct points in one set is a small value $k=\log^{O(1)}(\Delta n)$. Our first algorithm gives a $(1+\epsilon)$-approximation using $O(k\epsilon^{-2}\log^{4}n)$ space and works only in the insertion-only model. The second algorithm gives a $O(\min(k^3,\log\Delta))$-approximation using $O(\log^{3}\Delta\cdot\log\log\Delta\cdot\log n)$-space in the turnstile model.

[1]  Robert Krauthgamer,et al.  Approximate classification via earthmover metrics , 2004, SODA '04.

[2]  Piotr Indyk,et al.  Algorithms for dynamic geometric problems over data streams , 2004, STOC '04.

[3]  Pankaj K. Agarwal,et al.  A near-linear constant-factor approximation for euclidean bipartite matching? , 2004, SCG '04.

[4]  Pankaj K. Agarwal,et al.  A near-linear time ε-approximation algorithm for geometric bipartite matching , 2012, STOC '12.

[5]  T. S. Jayram,et al.  OPEN PROBLEMS IN DATA STREAMS AND RELATED TOPICS IITK WORKSHOP ON ALGORITHMS FOR DATA STREAMS ’06 , 2007 .

[6]  Piotr Indyk,et al.  A near linear time constant factor approximation for Euclidean bichromatic matching (cost) , 2007, SODA '07.

[7]  Sariel Har-Peled,et al.  On coresets for k-means and k-median clustering , 2004, STOC '04.

[8]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[9]  A. Naor,et al.  Nonembeddability theorems via Fourier analysis , 2006 .

[10]  Luca Trevisan,et al.  Counting Distinct Elements in a Data Stream , 2002, RANDOM.

[11]  Joseph Naor,et al.  Approximation algorithms for the metric labeling problem via a new linear programming formulation , 2001, SODA '01.

[12]  Michael Werman,et al.  A Unified Approach to the Change of Resolution: Space and Gray-Level , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Piotr Indyk,et al.  Stable distributions, pseudorandom generators, embeddings, and data stream computation , 2006, JACM.

[14]  Gideon Schechtman,et al.  Planar Earthmover is not in L_1 , 2005, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[15]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[16]  Christian Sohler,et al.  Coresets in dynamic geometric data streams , 2005, STOC '05.

[17]  Alexandr Andoni,et al.  Earth mover distance over high-dimensional spaces , 2008, SODA '08.

[18]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[19]  David P. Woodruff,et al.  Open Problems in Data Streams, Property Testing, and Related Topics , 2011 .

[20]  Subhash Khot,et al.  Nonembeddability theorems via Fourier analysis , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[21]  David P. Woodruff,et al.  Efficient Sketches for Earth-Mover Distance, with Applications , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[22]  Yuval Rabani,et al.  Approximating k-median with non-uniform capacities , 2005, SODA '05.

[23]  Andrew McGregor,et al.  Sketching Earth-Mover Distance on Graph Metrics , 2013, APPROX-RANDOM.

[24]  Pravin M. Vaidya,et al.  Geometry helps in matching , 1989, STOC '88.