Lost Relatives of the Gumbel Trick

© 2017 International Machine Learning Society (IMLS). All rights reserved. The Gumbel trick is a method to sample from a discrete probability distribution, or to estimate its normalizing partition function. The method relies on repeatedly applying a random perturbation to the distribution in a particular way, each time solving for the most likely configuration. We derive an entire family of related methods, of which the Gumbel trick is one member, and show that the new methods have superior properties in several settings with minimal additional computational cost. In particular, for the Gumbel trick to yield computational benefits for discrete graphical models, Gumbel perturbations on all configurations are typically replaced with socalled low-rank perturbations. We show how a subfamily of our new methods adapts to this setting, proving new upper and lower bounds on the log partition function and deriving a family of sequential samplers for the Gibbs distribution. Finally, we balance the discussion by showing how the simpler analytical form of the Gumbel trick enables additional theoretical results.

[1]  J. Gurland An inequality satisfied by the Gamma function , 1956 .

[2]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[3]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[4]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[6]  Jérôme Darbon,et al.  Global optimization for first order Markov Random Fields with submodular priors , 2008, Discret. Appl. Math..

[7]  George Papandreou,et al.  Gaussian sampling by local perturbations , 2010, NIPS.

[8]  Joris M. Mooij,et al.  libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models , 2010, J. Mach. Learn. Res..

[9]  George Papandreou,et al.  Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models , 2011, 2011 International Conference on Computer Vision.

[10]  Ryan P. Adams,et al.  Randomized Optimum Models for Structured Prediction , 2012, AISTATS.

[11]  Tommi S. Jaakkola,et al.  On the Partition Function and Random Maximum A-Posteriori Perturbations , 2012, ICML.

[12]  Subhransu Maji,et al.  On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations , 2013, NIPS.

[13]  Bart Selman,et al.  Taming the Curse of Dimensionality: Discrete Integration by Hashing and Optimization , 2013, ICML.

[14]  A. McCallum,et al.  Marginal Inference in MRFs using Frank-Wolfe , 2013 .

[15]  Adrian Weller,et al.  Clamping Variables and Approximate Inference , 2014, NIPS.

[16]  Tom Minka,et al.  A* Sampling , 2014, NIPS.

[17]  Adrian Weller,et al.  Approximating the Bethe Partition Function , 2013, UAI.

[18]  Anand D. Sarwate,et al.  On Measure Concentration of Random Maximum A-Posteriori Perturbations , 2013, ICML.

[19]  Subhransu Maji,et al.  Active Boundary Annotation using Random MAP Perturbations , 2014, AISTATS.

[20]  Rahul G. Krishnan,et al.  Barrier Frank-Wolfe for Marginal Inference , 2015, NIPS.

[21]  Stefano Ermon,et al.  Exact Sampling with Integer Linear Programs and Random Perturbations , 2016, AAAI.

[22]  Zoubin Ghahramani,et al.  Scalable Discrete Sampling as a Multi-Armed Bandit Problem , 2015, ICML.

[23]  Chris J. Maddison A Poisson process model for Monte Carlo , 2016, 1602.05986.

[24]  Adrian Weller,et al.  Clamping Improves TRW and Mean Field Approximations , 2016, AISTATS.

[25]  Qiang Liu,et al.  Local Perturb-and-MAP for Structured Prediction , 2016, AISTATS.

[26]  Subhransu Maji,et al.  High Dimensional Inference With Random Maximum A-Posteriori Perturbations , 2016, IEEE Transactions on Information Theory.