Earth Mover's Distance based Similarity Search at Scale

Earth Mover's Distance (EMD), as a similarity measure, has received a lot of attention in the fields of multimedia and probabilistic databases, computer vision, image retrieval, machine learning, etc. EMD on multidimensional histograms provides better distinguishability between the objects approximated by the histograms (e.g., images), compared to classic measures like Euclidean distance. Despite its usefulness, EMD has a high computational cost; therefore, a number of effective filtering methods have been proposed, to reduce the pairs of histograms for which the exact EMD has to be computed, during similarity search. Still, EMD calculations in the refinement step remain the bottleneck of the whole similarity search process. In this paper, we focus on optimizing the refinement phase of EMD-based similarity search by (i) adapting an efficient min-cost flow algorithm (SIA) for EMD computation, (ii) proposing a dynamic distance bound, which can be used to terminate an EMD refinement early, and (iii) proposing a dynamic refinement order for the candidates which, paired with a concurrent EMD refinement strategy, reduces the amount of needless computations. Our proposed techniques are orthogonal to and can be easily integrated with the state-of-the-art filtering techniques, reducing the cost of EMD-based similarity queries by orders of magnitude.

[1]  David P. Woodruff,et al.  Efficient Sketches for Earth-Mover Distance, with Applications , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[2]  Peter J. Bickel,et al.  The Earth Mover's distance is the Mallows distance: some insights from statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[3]  Joachim M. Buhmann,et al.  Empirical evaluation of dissimilarity measures for color and texture , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  Ira Assent,et al.  Efficient EMD-based similarity search in multimedia databases via flexible dimensionality reduction , 2008, SIGMOD Conference.

[5]  Xiaotie Deng,et al.  Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Mover's Distance (EMD) , 2006, IEEE Transactions on Dependable and Secure Computing.

[6]  Kyriakos Mouratidis,et al.  Optimal matching between spatial datasets under capacity constraints , 2010, TODS.

[7]  David W. Jacobs,et al.  Approximate earth mover’s distance in linear time , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Ira Assent,et al.  Approximation Techniques for Indexing the Earth Mover’s Distance in Multimedia Databases , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[9]  Aranyak Mehta,et al.  On earthmover distance, metric labeling, and 0-extension , 2006, STOC '06.

[10]  Alexandr Andoni,et al.  Earth mover distance over high-dimensional spaces , 2008, SODA '08.

[11]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12]  Doina Precup,et al.  Methods for Computing State Similarity in Markov Decision Processes , 2006, UAI.

[13]  Michael Lindenbaum,et al.  Nonnegative Matrix Factorization with Earth Mover's Distance metric , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Simon Urbanek,et al.  Unsupervised clustering of multidimensional distributions using earth mover distance , 2011, KDD.

[15]  Ambuj K. Singh,et al.  Indexing the Earth Mover's Distance Using Normal Distributions , 2011, Proc. VLDB Endow..

[16]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[17]  Anthony K. H. Tung,et al.  EUDEMON: A System for Online Video Frame Copy Detection by Earth Mover's Distance , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[18]  Anthony K. H. Tung,et al.  Efficient and effective similarity search over probabilistic data based on Earth Mover’s Distance , 2010, The VLDB Journal.

[19]  Junsong Yuan,et al.  Robust hand gesture recognition based on finger-earth mover's distance with a commodity depth camera , 2011, ACM Multimedia.

[20]  Remco C. Veltkamp,et al.  Using transportation distances for measuring melodic similarity , 2003, ISMIR.

[21]  M. Hidayath Ansari,et al.  Comparing Clusterings in Space , 2010, ICML.

[22]  Michael Werman,et al.  Fast and robust Earth Mover's Distances , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[23]  Carlo Tomasi,et al.  Perceptual metrics for image database navigation , 1999 .

[24]  Lei Zhang,et al.  A Novel Earth Mover's Distance Methodology for Image Matching with Gaussian Mixture Models , 2013, 2013 IEEE International Conference on Computer Vision.

[25]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[26]  Christos Faloutsos,et al.  A linear-time approximation of the earth mover's distance , 2011, CIKM '11.

[27]  Tamraparni Dasu,et al.  Statistical Distortion: Consequences of Data Cleaning , 2012, Proc. VLDB Endow..

[28]  西崎 一郎,et al.  数理計画法入門 = Introduction to mathematical programming , 2014 .

[29]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[30]  Haibin Ling,et al.  An Efficient Earth Mover's Distance Algorithm for Robust Histogram Comparison , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Trevor Darrell,et al.  Fast contour matching using approximate earth mover's distance , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[32]  L. Guibas,et al.  The Earth Mover''s Distance: Lower Bounds and Invariance under Translation , 1997 .