Efficient Weight Learning for Markov Logic Networks

Markov logic networks (MLNs) combine Markov networks and first-order logic, and are a powerful and increasingly popular representation for statistical relational learning. The state-of-the-art method for discriminative learning of MLN weights is the voted perceptron algorithm, which is essentially gradient descent with an MPE approximation to the expected sufficient statistics (true clause counts). Unfortunately, these can vary widely between clauses, causing the learning problem to be highly ill-conditioned, and making gradient descent very slow. In this paper, we explore several alternatives, from per-weight learning rates to second-order methods. In particular, we focus on two approaches that avoid computing the partition function: diagonal Newton and scaled conjugate gradient. In experiments on standard SRL datasets, we obtain order-of-magnitude speedups, or more accurate models given comparable learning times.

[1]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[2]  Mark Craven,et al.  Relational Learning with Statistical Predicate Invention: Better Models for Hypertext , 2001, Machine Learning.

[3]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[4]  Bart Selman,et al.  A general stochastic approach to solving problems with hard and soft constraints , 1996, Satisfiability Problem: Theory and Applications.

[5]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[6]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[7]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[8]  Pedro M. Domingos,et al.  Discriminative Training of Markov Logic Networks , 2005, AAAI.

[9]  R. Fletcher Practical Methods of Optimization , 1988 .

[10]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[11]  Matthew Richardson,et al.  The Alchemy System for Statistical Relational AI: User Manual , 2007 .

[12]  Pedro M. Domingos,et al.  Sound and Efficient Inference with Probabilistic and Deterministic Dependencies , 2006, AAAI.

[13]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[14]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[15]  Mark W. Schmidt,et al.  Accelerated training of conditional random fields with stochastic gradient methods , 2006, ICML.

[16]  Pedro M. Domingos,et al.  Entity Resolution with Markov Logic , 2006, Sixth International Conference on Data Mining (ICDM'06).

[17]  Ben Taskar,et al.  Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning) , 2007 .

[18]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[19]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[20]  Pedro M. Domingos,et al.  Learning the structure of Markov logic networks , 2005, ICML.

[21]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[22]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..