Advances in Learning and Inference for Partition-wise Models of Coreference Resolution

Noun phrase coreference resolution is a difficult task that has driven research in both natural language processing and machine learning. It has been a subject of feature engineering as well as model development, including partition-wise conditional random fields with Markov-chain Monte Carlo inference. In this paper we combine the latest feature engineering with an exploration of machine learning advances in three areas: the proposal distribution for inference, the ground-truth evaluation signal for the objective function, and the online update rule for parameter estimation. In particular, we investigate learned, adaptive proposal distributions; we evaluate various methods of ranking possible worlds; and we adapt a recently proposed confidence-weighted classification method to structured prediction with SampleRank. We achieve new best results on ACE 2004, surpassing two previous state-of-the-art systems with 10% and 3% error reductions respectively.

[1]  Dan Roth,et al.  Understanding the Value of Features for Coreference Resolution , 2008, EMNLP.

[2]  Koby Crammer,et al.  Confidence-weighted linear classification , 2008, ICML '08.

[3]  Pedro M. Domingos,et al.  Joint Inference in Information Extraction , 2007, AAAI.

[4]  Andrew McCallum,et al.  First-Order Probabilistic Models for Coreference Resolution , 2007, NAACL.

[5]  Pascal Denis,et al.  Joint Determination of Anaphoricity and Coreference Resolution using Integer Programming , 2007, NAACL.

[6]  Pascal Denis,et al.  A Ranking Approach to Pronoun Resolution , 2007, IJCAI.

[7]  David A. Smith,et al.  Minimum Risk Annealing for Training Log-Linear Models , 2006, ACL.

[8]  Daniel Marcu,et al.  Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[9]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[10]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[11]  Lynette Hirschman,et al.  A Model-Theoretic Coreference Scoring Scheme , 1995, MUC.

[12]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[13]  Andrew McCallum,et al.  Learning and inference in weighted logic with application to natural language processing , 2008 .

[14]  Charles Sutton,et al.  Piecewise Training with Parameter Independence Diagrams: Comparing Globally- and Locally-trained Linear-chain CRFs , 2004 .

[15]  Stuart J. Russell,et al.  BLOG: Relational Modeling with Unknown Objects , 2004 .

[16]  Andrew McCallum,et al.  Toward Conditional Models of Identity Uncertainty with Application to Proper Noun Coreference , 2003, IIWeb.

[17]  Stuart J. Russell,et al.  Identity Uncertainty and Citation Matching , 2002, NIPS.

[18]  Breck Baldwin,et al.  Algorithms for Scoring Coreference Chains , 1998 .

[19]  Hal Daumé,et al.  Machine Learning manuscript No. (will be inserted by the editor) Search-based Structured Prediction , 2022 .