Bethe Projections for Non-Local Inference

Many inference problems in structured prediction are naturally solved by augmenting a tractable dependency structure with complex, non-local auxiliary objectives. This includes the mean field family of variational inference algorithms, soft- or hard-constrained inference using Lagrangian relaxation or linear programming, collective graphical models, and forms of semi-supervised learning such as posterior regularization. We present a method to discriminatively learn broad families of inference objectives, capturing powerful non-local statistics of the latent variables, while maintaining tractable and provably fast inference using non-Euclidean projected gradient descent with a distance-generating function given by the Bethe entropy. We demonstrate the performance and flexibility of our method by (1) extracting structured citations from research papers by learning soft global constraints, (2) achieving state-of-the-art results on a widely-used handwriting recognition task using a novel learned non-convex inference procedure, and (3) providing a fast and highly scalable algorithm for the challenging problem of inference in a collective graphical model applied to bird migration.

[1]  Jason D. M. Rennie Smooth Hinge Classication , 2013 .

[2]  Andrew McCallum,et al.  A New Dataset for Fine Grained Citation Field Extraction (Author's Manuscript) , 2013 .

[3]  Veselin Stoyanov,et al.  Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure , 2011, AISTATS.

[4]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[5]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[6]  Tommi S. Jaakkola,et al.  Introduction to dual composition for inference , 2011 .

[7]  Ambuj Tewari,et al.  Composite objective mirror descent , 2010, COLT 2010.

[8]  Martin J. Wainwright,et al.  Message-passing for Graph-structured Linear Programs: Proximal Methods and Rounding Schemes , 2010, J. Mach. Learn. Res..

[9]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[10]  Ben Taskar,et al.  Graph-Based Posterior Regularization for Semi-Supervised Structured Prediction , 2013, CoNLL.

[11]  Justin Domke,et al.  Generic Methods for Optimization-Based Modeling , 2012, AISTATS.

[12]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[13]  Nikos Komodakis,et al.  MRF Optimization via Dual Decomposition: Message-Passing Revisited , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[14]  D. Sontag 1 Introduction to Dual Decomposition for Inference , 2010 .

[15]  Gideon S. Mann,et al.  Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data , 2010, J. Mach. Learn. Res..

[16]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[17]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[18]  Gregory B. Passty Ergodic convergence to a zero of the sum of monotone operators in Hilbert space , 1979 .

[19]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[20]  Dan Klein,et al.  Learning from measurements in exponential families , 2009, ICML '09.

[21]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[22]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[23]  Julien Mairal,et al.  Optimization with First-Order Surrogate Functions , 2013, ICML.

[24]  Ben Taskar,et al.  Structured Prediction Cascades , 2010, AISTATS.

[25]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[26]  Thomas G. Dietterich,et al.  Approximate Inference in Collective Graphical Models , 2013, ICML.

[27]  Ben Taskar,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[28]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[29]  I JordanMichael,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008 .

[30]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[31]  Andrew McCallum,et al.  Learning Soft Linear Constraints with Application to Citation Field Extraction , 2014, ACL.

[32]  Eric P. Xing,et al.  An Augmented Lagrangian Approach to Constrained MAP Inference , 2011, ICML.

[33]  Qiang Fu,et al.  Bethe-ADMM for Tree Decomposition based Parallel MAP Inference , 2013, UAI.

[34]  Richard S. Zemel,et al.  Structured Output Learning with High Order Loss Functions , 2012, AISTATS.

[35]  Thomas G. Dietterich,et al.  Collective Graphical Models , 2011, NIPS.