Marginal Inference in MRFs using Frank-Wolfe

We introduce an algorithm, based on the Frank-Wolfe technique (conditional gradient), for performing marginal inference in undirected graphical models by repeatedly performing MAP inference. It minimizes standard Bethe-style convex variational objectives for inference, leverages known MAP algorithms as black boxes, and offers a principled means to construct sparse approximate marginals for high-arity graphs. We also offer intuition and empirical evidence for a relationship between the entropy of the true marginal distribution of the model and the convergence rate of the algorithm. We advocate for further applications of Frank-Wolfe to marginal inference in Gibbs distributions with combinatorial energy functions.

[1]  Solomon Eyal Shimony,et al.  Finding MAPs for Belief Networks is NP-Hard , 1994, Artif. Intell..

[2]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Mark Huber,et al.  Exact Sampling from Perfect Matchings of Dense Regular Bipartite Graphs , 2006, Algorithmica.

[4]  Martin J. Wainwright,et al.  A new class of upper bounds on the log partition function , 2002, IEEE Transactions on Information Theory.

[5]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Tommi S. Jaakkola,et al.  Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations , 2007, NIPS.

[7]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[8]  Jin Yu,et al.  Exponential Family Graph Matching and Ranking , 2009, NIPS.

[9]  Bert Huang,et al.  Approximating the Permanent with Belief Propagation , 2009, ArXiv.

[10]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[11]  Pascal O. Vontobel,et al.  The Bethe permanent of a non-negative matrix , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[12]  Maksims Volkovs,et al.  Efficient Sampling for Bipartite Matching Problems , 2012, NIPS.

[13]  Tommi S. Jaakkola,et al.  On the Partition Function and Random Maximum A-Posteriori Perturbations , 2012, ICML.

[14]  Bart Selman,et al.  Taming the Curse of Dimensionality: Discrete Integration by Hashing and Optimization , 2013, ICML.

[15]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[16]  Tim Roughgarden,et al.  Marginals-to-Models Reducibility , 2013, NIPS.

[17]  Elad Hazan,et al.  A Linearly Convergent Conditional Gradient Algorithm with Applications to Online and Stochastic Optimization , 2013, 1301.4666.

[18]  Zaïd Harchaoui,et al.  Conditional gradient algorithms for norm-regularized smooth convex optimization , 2013, Math. Program..