Sublinear Time Algorithms for Earth Mover’s Distance

We study the problem of estimating the Earth Mover’s Distance (EMD) between probability distributions when given access only to samples of the distribution. We give closeness testers and additive-error estimators over domains in [0,1]d, with sample complexities independent of domain size—permitting the testability even of continuous distributions over infinite domains. Instead, our algorithms depend on the dimension of the domain space and the quality of the result required. We also prove lower bounds for closeness testing, showing the dependencies on these parameters to be essentially optimal. Additionally, we consider whether natural classes of distributions exist for which there are algorithms with better dependence on the dimension, and show that for highly clusterable data, this is indeed the case. Lastly, we consider a variant of the EMD, defined over tree metrics instead of the usual ℓ1 metric, and give tight upper and lower bounds.

[1]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[2]  Carlo Tomasi,et al.  Corner detection in textured color images , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[3]  Seshadhri Comandur,et al.  Testing Expansion in Bounded Degree Graphs , 2007, Electron. Colloquium Comput. Complex..

[4]  Leonidas J. Guibas,et al.  The Earth Mover's Distance under transformation sets , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[5]  C. Tomasi The Earth Mover's Distance, Multi-Dimensional Scaling, and Color-Based Image Retrieval , 1997 .

[6]  Ronitt Rubinfeld,et al.  Testing that distributions are close , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[7]  Satish Rao,et al.  A tight bound on approximating arbitrary metrics by tree metrics , 2003, STOC '03.

[8]  Michael Werman,et al.  A Unified Approach to the Change of Resolution: Space and Gray-Level , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Noga Alon,et al.  Testing of Clustering , 2003, SIAM J. Discret. Math..

[10]  Carlo Tomasi,et al.  Color edge detection with the compass operator , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[11]  Carlo Tomasi,et al.  Texture metrics , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).

[12]  Carlo Tomasi,et al.  The Earth Mover’s Distance , 2001 .

[13]  Paul Valiant Testing symmetric properties of distributions , 2008, STOC '08.

[14]  Peter J. Bickel,et al.  The Earth Mover's distance is the Mallows distance: some insights from statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[15]  Michael Werman,et al.  A Unified Approach To The Change Of Resolution: Space And Grey Level , 1988, Photonics West - Lasers and Applications in Science and Engineering.

[16]  Alexandr Andoni,et al.  Earth mover distance over high-dimensional spaces , 2008, SODA '08.

[17]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[18]  Yair Bartal,et al.  Probabilistic approximation of metric spaces and its algorithmic applications , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[19]  Ronitt Rubinfeld,et al.  Testing random variables for independence and identity , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[20]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[21]  Piotr Indyk,et al.  A near linear time constant factor approximation for Euclidean bichromatic matching (cost) , 2007, SODA '07.