SampleRank: Training Factor Graphs with Atomic Gradients

We present SampleRank, an alternative to contrastive divergence (CD) for estimating parameters in complex graphical models. SampleRank harnesses a user-provided loss function to distribute stochastic gradients across an MCMC chain. As a result, parameter updates can be computed between arbitrary MCMC states. SampleRank is not only faster than CD, but also achieves better accuracy in practice (up to 23% error reduction on noun-phrase coreference).

[1]  Rahul Gupta,et al.  Accurate max-margin training for structured output spaces , 2008, ICML '08.

[2]  D. Bertsekas Gradient convergence in gradient methods , 1997 .

[3]  Ruslan Salakhutdinov,et al.  Learning in Markov Random Fields using Tempered Transitions , 2009, NIPS.

[4]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[5]  Breck Baldwin,et al.  Algorithms for Scoring Coreference Chains , 1998 .

[6]  Thorsten Joachims,et al.  Training structural SVMs when exact inference is intractable , 2008, ICML '08.

[7]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[8]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[9]  Tommi S. Jaakkola,et al.  Learning Efficiently with Approximate Inference via Dual Losses , 2010, ICML.

[10]  Andrew McCallum,et al.  First-Order Probabilistic Models for Coreference Resolution , 2007, NAACL.

[11]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[12]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[13]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[14]  Tamir Hazan,et al.  Direct Loss Minimization for Structured Prediction , 2010, NIPS.

[15]  R. Rubinstein,et al.  An Efficient Stochastic Approximation Algorithm for Stochastic Saddle Point Problems , 2005 .

[16]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[17]  Stuart J. Russell,et al.  BLOG: Relational Modeling with Unknown Objects , 2004 .

[18]  Benjamin Van Roy,et al.  On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming , 2004, Math. Oper. Res..

[19]  Michael L. Wick,et al.  Inference and Learning in Large Factor Graphs with Adaptive Proposal Distributions and a Rank-based Objective , 2009 .

[20]  Giuseppe Carlo Calafiore,et al.  Uncertain convex programs: randomized solutions and confidence levels , 2005, Math. Program..

[21]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[22]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[23]  Fernando Pereira,et al.  Structured Learning with Approximate Inference , 2007, NIPS.