Majorization for CRFs and Latent Likelihoods

The partition function plays a key role in probabilistic modeling including conditional random fields, graphical models, and maximum likelihood estimation. To optimize partition functions, this article introduces a quadratic variational upper bound. This inequality facilitates majorization methods: optimization of complicated functions through the iterative solution of simpler sub-problems. Such bounds remain efficient to compute even when the partition function involves a graphical model (with small tree-width) or in latent likelihood settings. For large-scale problems, low-rank versions of the bound are provided and outperform LBFGS as well as first-order methods. Several learning applications are shown and reduce to fast and convergent update rules. Experimental results show advantages over state-of-the-art optimization methods.

[1]  Samuel Kaski,et al.  Expectation maximization algorithms for conditional likelihoods , 2005, ICML '05.

[2]  Hanna M. Wallach,et al.  Efficient Training of Conditional Random Fields , 2002 .

[3]  Greg Mori,et al.  Max-margin hidden conditional random fields for human action recognition , 2009, CVPR.

[4]  B. Lindsay,et al.  Monotonicity of quadratic-approximation algorithms , 1988 .

[5]  Tony Jebara,et al.  Multitask Sparsity via Maximum Entropy Discrimination , 2011, J. Mach. Learn. Res..

[6]  Dan Roth,et al.  Integer linear programming inference for conditional random fields , 2005, ICML.

[7]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[8]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[9]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[10]  Andrew McCallum,et al.  Piecewise training for structured prediction , 2009, Machine Learning.

[11]  T. MacRobert,et al.  An Introduction to the Theory of Infinite Series. , 1928 .

[12]  Guillaume Bouchard Efficient Bounds for the Softmax Function and Applications to Approximate Inference in Hybrid models , 2008 .

[13]  Jean-Yves Audibert Optimization for Machine Learning , 1995 .

[14]  Mark W. Schmidt,et al.  Accelerated training of conditional random fields with stochastic gradient methods , 2006, ICML.

[15]  Yi Mao,et al.  Generalized isotonic conditional random fields , 2009, Machine Learning.

[16]  Stephen J. Wright,et al.  Optimization for Machine Learning , 2013 .

[17]  Alex Pentland,et al.  On Reversing Jensen's Inequality , 2000, NIPS.

[18]  Yuan Qi,et al.  Bayesian Conditional Random Fields , 2005, AISTATS.

[19]  Adam Berger,et al.  The Improved Iterative Scaling Algorithm A Gentle Introduction , 2003 .

[20]  Trevor Darrell,et al.  Hidden Conditional Random Fields , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[22]  G. M. An Introduction to the Theory of Infinite Series , 1908, Nature.

[23]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[24]  Xavier Carreras,et al.  Exponentiated gradient algorithms for log-linear structured prediction , 2007, ICML '07.

[25]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[26]  Yuan Qi,et al.  Diagram structure recognition by Bayesian conditional random fields , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  Michael I. Jordan Graphical Models , 2003 .

[28]  Trevor Darrell,et al.  Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  Julien Mairal,et al.  Convex optimization with sparsity-inducing norms , 2011 .

[30]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[32]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[33]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[34]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[35]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.