Implicit Differentiation by Perturbation

This paper proposes a simple and efficient finite difference method for implicit differentiation of marginal inference results in discrete graphical models. Given an arbitrary loss function, defined on marginals, we show that the derivatives of this loss with respect to model parameters can be obtained by running the inference procedure twice, on slightly perturbed model parameters. This method can be used with approximate inference, with a loss function over approximate marginals. Convenient choices of loss functions make it practical to fit graphical models with hidden variables, high treewidth and/or model misspecification.

[1]  Daphne Koller,et al.  Constrained Approximate Maximum Entropy Learning of Markov Random Fields , 2008, UAI.

[2]  Joachim M. Buhmann,et al.  Spanning Tree Approximations for Conditional Random Fields , 2009, AISTATS.

[3]  Justin Domke Learning Convex Inference of Marginals , 2008, UAI.

[4]  Martin J. Wainwright,et al.  A new class of upper bounds on the log partition function , 2002, IEEE Transactions on Information Theory.

[5]  Tamir Hazan,et al.  Convergent Message-Passing Algorithms for Inference over General Graphs with Convex Free Energies , 2008, UAI.

[6]  Yee Whye Teh,et al.  Belief Optimization for Binary Networks: A Stable Alternative to Loopy Belief Propagation , 2001, UAI.

[7]  Yee Whye Teh,et al.  An Alternate Objective Function for Markovian Fields , 2002, ICML.

[8]  Mark W. Schmidt,et al.  Accelerated training of conditional random fields with stochastic gradient methods , 2006, ICML.

[9]  Martin J. Wainwright,et al.  Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting , 2006, J. Mach. Learn. Res..

[10]  Martial Hebert,et al.  Discriminative Random Fields , 2006, International Journal of Computer Vision.

[11]  B. Schölkopf,et al.  Training Conditional Random Fields for Maximum Labelwise Accuracy , 2007 .

[12]  Tom Heskes,et al.  Convexity Arguments for Efficient Minimization of the Bethe and Kikuchi Free Energies , 2006, J. Artif. Intell. Res..

[13]  Michael I. Jordan,et al.  An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators , 2008, ICML '08.

[14]  Joris M. Mooij,et al.  libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models , 2010, J. Mach. Learn. Res..

[15]  N. Andrei Accelerated conjugate gradient algorithm with finite difference Hessian/vector product approximation for unconstrained optimization , 2009 .

[16]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[17]  Zoubin Ghahramani,et al.  Choosing a Variable to Clamp , 2009, International Conference on Artificial Intelligence and Statistics.

[18]  Alan L. Yuille,et al.  CCCP Algorithms to Minimize the Bethe and Kikuchi Free Energies: Convergent Alternatives to Belief Propagation , 2002, Neural Computation.

[19]  Hilbert J. Kappen,et al.  Approximate Inference and Constrained Optimization , 2002, UAI.

[20]  William T. Freeman,et al.  Constructing free-energy approximations and generalized belief propagation algorithms , 2005, IEEE Transactions on Information Theory.

[21]  Ofer Meshi,et al.  Convexifying the Bethe Free Energy , 2009, UAI.

[22]  Martial Hebert,et al.  Exploiting Inference for Approximate Parameter Learning in Discriminative Fields: An Empirical Study , 2005, EMMCVPR.

[23]  Amir Globerson,et al.  Convergent message passing algorithms - a unifying view , 2009, UAI.