SampleRank : Learning Preferences from Atomic Gradients

Large templated factor graphs with complex structure that changes during inference have been shown to provide state-of-the-art experimental results on tasks such as identity uncertainty and information integration. However, learning parameters in these models is difficult because computing the gradients require expensive inference routines. In this paper we propose an online algorithm that instead learns preferences over hypotheses from the gradients between the atomic steps of inference. Although there are a combinatorial number of ranking constraints over the entire hypothesis space, a connection to the frameworks of sampled convex programs reveals a polynomial bound on the number of rankings that need to be satisfied in practice. We further apply ideas of passive aggressive algorithms to our update rules, enabling us to extend recent work in confidenceweighted classification to structured prediction problems. We compare our algorithm to structured perceptron, contrastive divergence, and persistent contrastive divergence, demonstrating substantial error reductions on two real-world problems (20% over contrastive divergence).

[1]  Andrew McCallum,et al.  FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs , 2009, NIPS.

[2]  Michael L. Wick,et al.  Inference and Learning in Large Factor Graphs with Adaptive Proposal Distributions and a Rank-based Objective , 2009 .

[3]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[4]  Koby Crammer,et al.  Confidence-weighted linear classification , 2008, ICML '08.

[5]  Andrew McCallum,et al.  Learning and inference in weighted logic with application to natural language processing , 2008 .

[6]  Andrew McCallum,et al.  First-Order Probabilistic Models for Coreference Resolution , 2007, NAACL.

[7]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[8]  Benjamin Van Roy,et al.  Tetris: A Study of Randomized Constraint Sampling , 2006 .

[9]  Daniel Marcu,et al.  Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[10]  Giuseppe Carlo Calafiore,et al.  Uncertain convex programs: randomized solutions and confidence levels , 2005, Math. Program..

[11]  Benjamin Van Roy,et al.  On Constraint Sampling in the Linear Programming Approach to Approximate Dynamic Programming , 2004, Math. Oper. Res..

[12]  Radford M. Neal,et al.  A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model , 2004 .

[13]  Stuart J. Russell,et al.  BLOG: Relational Modeling with Unknown Objects , 2004 .

[14]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[15]  Pedro M. Domingos,et al.  Learning to map between ontologies on the semantic web , 2002, WWW '02.