Online Max-Margin Weight Learning for Markov Logic Networks

Most of the existing weight-learning algorithms for Markov Logic Networks (MLNs) use batch training which becomes computationally expensive and even infeasible for very large datasets since the training examples may not fit in main memory. To overcome this problem, previous work has used online learning algorithms to learn weights for MLNs. However, this prior work has only applied existing online algorithms, and there is no comprehensive study of online weight learning for MLNs. In this paper, we derive new online algorithms for structured prediction using the primaldual framework, apply them to learn weights for MLNs, and compare against existing online algorithms on two large, real-world datasets. The experimental results show that the new algorithms achieve better accuracy than existing methods.

[1]  Pedro M. Domingos,et al.  Discriminative Training of Markov Logic Networks , 2005, AAAI.

[2]  C. Lee Giles,et al.  Autonomous citation matching , 1999, AGENTS '99.

[3]  Pedro M. Domingos,et al.  Joint Inference in Information Extraction , 2007, AAAI.

[4]  Nathan Ratliff,et al.  Online) Subgradient Methods for Structured Prediction , 2007 .

[5]  Pedro M. Domingos,et al.  Markov Logic: An Interface Layer for Artificial Intelligence , 2009, Markov Logic: An Interface Layer for Artificial Intelligence.

[6]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[7]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[8]  Pedro M. Domingos,et al.  Learning Markov Logic Networks Using Structural Motifs , 2010, ICML.

[9]  Shai Shalev-Shwartz,et al.  Online learning: theory, algorithms and applications (למידה מקוונת.) , 2007 .

[10]  Raymond J. Mooney,et al.  Learning to Disambiguate Search Queries from Short Sessions , 2009, ECML/PKDD.

[11]  Yoram Singer,et al.  Convex Repeated Games and Fenchel Duality , 2006, NIPS.

[12]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[13]  Pedro M. Domingos,et al.  Lifted First-Order Belief Propagation , 2008, AAAI.

[14]  Pedro M. Domingos,et al.  Sound and Efficient Inference with Probabilistic and Deterministic Dependencies , 2006, AAAI.

[15]  Raymond J. Mooney,et al.  Max-Margin Weight Learning for Markov Logic Networks , 2009, ECML/PKDD.

[16]  Sham M. Kakade,et al.  Mind the Duality Gap: Logarithmic regret algorithms for online optimization , 2008, NIPS.

[17]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[18]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[19]  Ben Taskar,et al.  Exponentiated Gradient Algorithms for Large-margin Structured Classification , 2004, NIPS.

[20]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[21]  Andrew McCallum,et al.  Learning and inference in weighted logic with application to natural language processing , 2008 .

[22]  Iván V. Meza,et al.  Collective Semantic Role Labelling with Markov Logic , 2008, CoNLL.

[23]  Sebastian Riedel Improving the Accuracy and Efficiency of MAP Inference for Markov Logic , 2008, UAI.

[24]  Pedro M. Domingos,et al.  Efficient Weight Learning for Markov Logic Networks , 2007, PKDD.

[25]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[26]  Peter L. Bartlett,et al.  Exponentiated Gradient Algorithms for Conditional Random Fields and Max-Margin Markov Networks , 2008, J. Mach. Learn. Res..

[27]  Raymond J. Mooney,et al.  Discriminative structure and parameter learning for Markov logic networks , 2008, ICML '08.

[28]  Andrew McCallum,et al.  FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs , 2009, NIPS.

[29]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[30]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[31]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[32]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[33]  Yoram Singer,et al.  A Large Margin Algorithm for Speech-to-Phoneme and Music-to-Score Alignment , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Ryan T. McDonald,et al.  Scalable Large-Margin Online Learning for Structured Classification , 2005 .

[35]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[36]  Raymond J. Mooney,et al.  Bottom-up learning of Markov logic network structure , 2007, ICML '07.

[37]  Xavier Carreras,et al.  Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling , 2005, CoNLL.

[38]  Bart Selman,et al.  A general stochastic approach to solving problems with hard and soft constraints , 1996, Satisfiability Problem: Theory and Applications.

[39]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[40]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[41]  Yoram Singer,et al.  A Unified Algorithmic Approach for Efficient Online Label Ranking , 2007, AISTATS.

[42]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.