Sinkhorn Distances: Lightspeed Computation of Optimal Transport

Optimal transport distances are a fundamental family of distances for probability measures and histograms of features. Despite their appealing theoretical properties, excellent performance in retrieval tasks and intuitive formulation, their computation involves the resolution of a linear program whose cost can quickly become prohibitive whenever the size of the support of these measures or the histograms' dimension exceeds a few hundred. We propose in this work a new family of optimal transport distances that look at transport problems from a maximum-entropy perspective. We smooth the classic optimal transport problem with an entropic regularization term, and show that the resulting optimum is also a distance which can be computed through Sinkhorn's matrix scaling algorithm at a speed that is several orders of magnitude faster than that of transport solvers. We also show that this regularized distance improves upon classic optimal transport distances on the MNIST classification problem.

[1]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[2]  I. Good Maximum Entropy for Hypothesis Formulation, Especially for Multidimensional Contingency Tables , 1963 .

[3]  Richard Sinkhorn Diagonal equivalence to matrices with prescribed row and column sums. II , 1967 .

[4]  A Wilson,et al.  Use of entropy maximizing models in theory of trip distribution, mode split and route split , 1969 .

[5]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[6]  D. Avis On the Extreme Rays of the Metric Cone , 1980, Canadian Journal of Mathematics.

[7]  C. Berg,et al.  Harmonic Analysis on Semigroups , 1984 .

[8]  J. Lorenz,et al.  On the scaling of multidimensional matrices , 1989 .

[9]  N. F. Stewart,et al.  The Gravity Model in Transportation Analysis - Theory and Extensions , 1990 .

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  R. Brualdi Combinatorial Matrix Classes , 2006 .

[12]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[13]  J. Ebel When will the earth move , 2003 .

[14]  Trevor Darrell,et al.  Fast contour matching using approximate earth mover's distance , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[15]  Planar Earthmover is not in L_1 , 2005, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[16]  Haibin Ling,et al.  An Efficient Earth Mover's Distance Algorithm for Robust Histogram Comparison , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Joachim Gudmundsson,et al.  Small Manhattan Networks and Algorithmic Applications for the Earth Mover ’ s Distance , 2007 .

[18]  Gideon Schechtman,et al.  Planar Earthmover is not in L_1 , 2005, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[19]  David W. Jacobs,et al.  Approximate earth mover’s distance in linear time , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Philip A. Knight,et al.  The Sinkhorn-Knopp Algorithm: Convergence and Applications , 2008, SIAM J. Matrix Anal. Appl..

[21]  R. Göbel,et al.  Diagonal equivalence of matrices , 2008 .

[22]  C. Villani Optimal Transport: Old and New , 2008 .

[23]  Inderjit S. Dhillon,et al.  The Metric Nearness Problem , 2008, SIAM J. Matrix Anal. Appl..

[24]  Michael Werman,et al.  Fast and robust Earth Mover's Distances , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  Peter Martini,et al.  An efficient GPU implementation of the revised simplex method , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[26]  Julien Rabin,et al.  Regularized Discrete Optimal Transport , 2013, SIAM J. Imaging Sci..