Learning Latent Permutations with Gumbel-Sinkhorn Networks

Permutations and matchings are core building blocks in a variety of latent variable models, as they allow us to align, canonicalize, and sort data. Learning in such models is difficult, however, because exact marginalization over these combinatorial objects is intractable. In response, this paper introduces a collection of new methods for end-to-end learning in such models that approximate discrete maximum-weight matching using the continuous Sinkhorn operator. Sinkhorn iteration is attractive because it functions as a simple, easy-to-implement analog of the softmax operator. With this, we can define the Gumbel-Sinkhorn method, an extension of the Gumbel-Softmax method (Jang et al. 2016, Maddison2016 et al. 2016) to distributions over latent matchings. We demonstrate the effectiveness of our method by outperforming competitive baselines on a range of qualitatively different tasks: sorting numbers, solving jigsaw puzzles, and identifying neural signals in worms.

[1]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[2]  Tim Rocktäschel,et al.  End-to-end Differentiable Proving , 2017, NIPS.

[3]  C. R. Rao,et al.  Convexity properties of entropy functions and analysis of diversity , 1984 .

[4]  Richard Zemel,et al.  Efficient Feature Learning Using Perturb-and-MAP , 2013 .

[5]  Roberto Cominetti,et al.  Asymptotic analysis of the exponential penalty trajectory in linear programming , 1994, Math. Program..

[6]  Jakub M. Tomczak On some properties of the low-dimensional Gumbel perturbations in the Perturb-and-MAP model , 2016 .

[7]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[8]  Noah A. Smith,et al.  Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[9]  Manfred K. Warmuth,et al.  Learning Permutations with Exponential Weights , 2007, COLT.

[10]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[11]  Pushmeet Kohli,et al.  TerpreT: A Probabilistic Programming Language for Program Induction , 2016, ArXiv.

[12]  Subhransu Maji,et al.  On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations , 2013, NIPS.

[13]  Gabriel Peyré,et al.  Sinkhorn-AutoDiff: Tractable Wasserstein Learning of Generative Models , 2017 .

[14]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[15]  Quoc V. Le,et al.  Neural Programmer: Inducing Latent Programs with Gradient Descent , 2015, ICLR.

[16]  Justin Domke,et al.  Learning Graphical Model Parameters with Approximate Marginal Inference , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Han Zhang,et al.  Improving GANs Using Optimal Transport , 2018, ICLR.

[18]  Ryan P. Adams,et al.  Ranking via Sinkhorn Propagation , 2011, ArXiv.

[19]  Lav R. Varshney,et al.  Structural Properties of the Caenorhabditis elegans Neuronal Network , 2009, PLoS Comput. Biol..

[20]  Alan L. Yuille,et al.  The invisible hand algorithm: Solving the assignment problem with statistical physics , 1994, Neural Networks.

[21]  Dustin Tran,et al.  Operator Variational Inference , 2016, NIPS.

[22]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[23]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[24]  Dustin Tran,et al.  Deep and Hierarchical Implicit Models , 2017, ArXiv.

[25]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[26]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[27]  George Papandreou,et al.  Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models , 2011, 2011 International Conference on Computer Vision.

[28]  Andrew McCallum,et al.  Bethe Projections for Non-Local Inference , 2015, UAI.

[29]  Anoop Cherian,et al.  DeepPermNet: Visual Permutation Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Veselin Stoyanov,et al.  Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure , 2011, AISTATS.

[31]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  Nicholas Ruozzi,et al.  Bethe Learning of Conditional Random Fields via MAP Decoding , 2015, ArXiv.

[34]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[35]  Ferenc Huszár,et al.  Variational Inference using Implicit Distributions , 2017, ArXiv.

[36]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[37]  Tommi S. Jaakkola,et al.  Approximate inference using conditional entropy decompositions , 2007, AISTATS.

[38]  Tim Rocktäschel,et al.  Programming with a Differentiable Forth Interpreter , 2016, ICML.

[39]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[40]  Gabriel Peyré,et al.  Learning Generative Models with Sinkhorn Divergences , 2017, AISTATS.

[41]  Samy Bengio,et al.  Order Matters: Sequence to sequence for sets , 2015, ICLR.

[42]  Richard Sinkhorn A Relationship Between Arbitrary Positive Matrices and Doubly Stochastic Matrices , 1964 .

[43]  Bert Huang,et al.  Approximating the Permanent with Belief Propagation , 2009, ArXiv.

[44]  C. Villani Topics in Optimal Transportation , 2003 .

[45]  Zoubin Ghahramani,et al.  Lost Relatives of the Gumbel Trick , 2017, ICML.

[46]  Scott W. Linderman,et al.  Reparameterizing the Birkhoff Polytope for Variational Permutation Inference , 2017, AISTATS.

[47]  Philip A. Knight,et al.  The Sinkhorn-Knopp Algorithm: Convergence and Applications , 2008, SIAM J. Matrix Anal. Appl..

[48]  Jin Yu,et al.  Exponential Family Graph Matching and Ranking , 2009, NIPS.

[49]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[50]  Tommi S. Jaakkola,et al.  On the Partition Function and Random Maximum A-Posteriori Perturbations , 2012, ICML.

[51]  Richard Sinkhorn,et al.  Concerning nonnegative matrices and doubly stochastic matrices , 1967 .

[52]  Samy Bengio,et al.  Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.