Sinkhorn Networks: Using Optimal Transport Techniques to Learn Permutations

Recently, Optimal Transport (OT) has received significant attention in the Machine Learning community. It has been shown to be useful as a tool for generative modeling, in which the density estimation problem is cast as the minimization of a linear function on the transportation polytope. Entropy regularization of this problem (Cuturi, 2013) has been demonstrated to be particularly useful, as its solution can be characterized in terms of the Sinkhorn operator, which i) can be computed more efficiently than the original problem and ii) enables efficient automatic differentiation (AD). We show that this technique extends to the Birkhoff polytope, and we use it to understand the solution of the linear assignment problem as a limit of the Sinkhorn operator. This observation justifies and enables the use of AD in computation graphs containing permutations as intermediate representations. As a result, we are able to introduce Sinkhorn networks for learning permutations, extending the work of Adams & Zemel (2011), and apply them to a variety of tasks. The success of our extension suggests entropy regularization might be used in other polytopes as well, enabling AD in other discrete structures.

[1]  Ryan P. Adams,et al.  Ranking via Sinkhorn Propagation , 2011, ArXiv.

[2]  Richard Sinkhorn A Relationship Between Arbitrary Positive Matrices and Doubly Stochastic Matrices , 1964 .

[3]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[4]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[5]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[6]  Julien Rabin,et al.  Regularized Discrete Optimal Transport , 2013, SIAM J. Imaging Sci..

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  C. R. Rao,et al.  Convexity properties of entropy functions and analysis of diversity , 1984 .

[9]  Ryan P. Adams,et al.  Cardinality Restricted Boltzmann Machines , 2012, NIPS.

[10]  Klaus-Robert Müller,et al.  Wasserstein Training of Restricted Boltzmann Machines , 2016, NIPS.

[11]  C. Villani Optimal Transport: Old and New , 2008 .

[12]  Brendan J. Frey,et al.  Fast Exact Inference for Recursive Cardinality Models , 2012, UAI.

[13]  Gabriel Peyré,et al.  Sinkhorn-AutoDiff: Tractable Wasserstein Learning of Generative Models , 2017 .

[14]  Samy Bengio,et al.  Order Matters: Sequence to sequence for sets , 2015, ICLR.

[15]  Manfred K. Warmuth,et al.  Learning Permutations with Exponential Weights , 2007, COLT.

[16]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[17]  C. Villani Topics in Optimal Transportation , 2003 .

[18]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[19]  Alan L. Yuille,et al.  The invisible hand algorithm: Solving the assignment problem with statistical physics , 1994, Neural Networks.

[20]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[21]  Steven Gold,et al.  Softmax to Softassign: neural network algorithms for combinatorial optimization , 1996 .

[22]  Anoop Cherian,et al.  DeepPermNet: Visual Permutation Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[24]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .