First-Order Probabilistic Models for Coreference Resolution

Traditional noun phrase coreference resolution systems represent features only of pairs of noun phrases. In this paper, we propose a machine learning method that enables features over sets of noun phrases, resulting in a first-order probabilistic model for coreference. We outline a set of approximations that make this approach practical, and apply our method to the ACE coreference dataset, achieving a 45% error reduction over a comparable method that only considers features of pairs of noun phrases. This result demonstrates an example of how a firstorder logic representation can be incorporated into a probabilistic model and scaled efficiently.

[1]  Andrew McCallum,et al.  Piecewise Training for Undirected Models , 2005, UAI.

[2]  Andrew McCallum,et al.  Collective Segmentation and Labeling of Distant Entities in Information Extraction , 2004 .

[3]  Vincent Ng,et al.  Machine Learning for Coreference Resolution: From Local Classification to Global Ranking , 2005, ACL.

[4]  Pascal Denis,et al.  A Ranking Approach to Pronoun Resolution , 2007, IJCAI.

[5]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[6]  Cristina Nicolae,et al.  BESTCUT: A Graph Algorithm for Coreference Resolution , 2006, EMNLP.

[7]  Stuart J. Russell,et al.  BLOG: Relational Modeling with Unknown Objects , 2004 .

[8]  Joseph Y. Halpern An Analysis of First-Order Logics of Probability , 1989, IJCAI.

[9]  Y. Censor,et al.  Parallel Optimization:theory , 1997 .

[10]  Xiaoqiang Luo,et al.  A Mention-Synchronous Coreference Resolution Algorithm Based On the Bell Tree , 2004, ACL.

[11]  Breck Baldwin,et al.  Algorithms for Scoring Coreference Chains , 1998 .

[12]  Daniel Marcu,et al.  A Large-Scale Exploration of Effective Global Features for a Joint Entity Detection and Tracking Model , 2005, HLT.

[13]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[14]  Andrew McCallum,et al.  Tractable Learning and Inference with High-Order Representations , 2006 .

[15]  Pedro M. Domingos,et al.  Discriminative Training of Markov Logic Networks , 2005, AAAI.

[16]  Razvan C. Bunescu,et al.  Collective Information Extraction with Relational Markov Networks , 2004, ACL.

[17]  Andrew McCallum,et al.  Toward Conditional Models of Identity Uncertainty with Application to Proper Noun Coreference , 2003, IIWeb.

[18]  Claire Gardent,et al.  Improving Machine Learning Approaches to Coreference Resolution , 2002, ACL.

[19]  H. Gaifman Concerning measures in first order calculi , 1964 .

[20]  Dan Roth,et al.  A Linear Programming Formulation for Global Inference in Natural Language Tasks , 2004, CoNLL.

[21]  Dan Roth,et al.  Lifted First-Order Probabilistic Inference , 2005, IJCAI.

[22]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[23]  Andrzej Stachurski,et al.  Parallel Optimization: Theory, Algorithms and Applications , 2000, Parallel Distributed Comput. Pract..

[24]  M. Paskin Maximum Entropy Probabilistic Logic , 2002 .

[25]  David Poole,et al.  First-order probabilistic inference , 2003, IJCAI.

[26]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[27]  Andrew McCallum,et al.  Conditional Models of Identity Uncertainty with Application to Noun Coreference , 2004, NIPS.

[28]  M. Paskin Maximum-Entropy Probabilistic Logics , 2001 .

[29]  Stuart J. Russell,et al.  BLOG: Probabilistic Models with Unknown Objects , 2005, IJCAI.

[30]  Daniel Marcu,et al.  Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[31]  Koby Crammer,et al.  Ultraconservative Online Algorithms for Multiclass Problems , 2001, J. Mach. Learn. Res..