This paper describes the structure of the LTH coreference solver used in the closed track of the CoNLL 2012 shared task (Pradhan et al., 2012). The solver core is a mention classifier that uses Soon et al. (2001)'s algorithm and features extracted from the dependency graphs of the sentences.
This system builds on Bjorkelund and Nugues (2011)'s solver that we extended so that it can be applied to the three languages of the task: English, Chinese, and Arabic. We designed a new mention detection module that removes pleonastic pronouns, prunes constituents, and recovers mentions when they do not match exactly a noun phrase. We carefully redesigned the features so that they reflect more complex linguistic phenomena as well as discourse properties. Finally, we introduced a minimal cluster model grounded in the first mention of an entity.
We optimized the feature sets for the three languages: We carried out an extensive evaluation of pairs of features and we complemented the single features with associations that improved the CoNLL score. We obtained the respective scores of 59.57, 56.62, and 48.25 on English, Chinese, and Arabic on the development set, 59.36, 56.85, and 49.43 on the test set, and the combined official score of 55.21.
[1]
Nizar Habash,et al.
CATiB: The Columbia Arabic Treebank
,
2009,
ACL.
[2]
Chih-Jen Lin,et al.
LIBLINEAR: A Library for Large Linear Classification
,
2008,
J. Mach. Learn. Res..
[3]
Nianwen Xue,et al.
CoNLL-2011 Shared Task: Modeling Unrestricted Coreference in OntoNotes
,
2011,
CoNLL Shared Task.
[4]
Richard Johansson,et al.
Extended Constituent-to-Dependency Conversion for English
,
2007,
NODALIDA.
[5]
Joakim Nivre,et al.
Inductive Dependency Parsing
,
2006,
Text, speech and language technology.
[6]
Yuchen Zhang,et al.
CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes
,
2012,
EMNLP-CoNLL Shared Task.
[7]
Hwee Tou Ng,et al.
A Machine Learning Approach to Coreference Resolution of Noun Phrases
,
2001,
CL.
[8]
Pierre Nugues,et al.
Exploring Lexicalized Features for Coreference Resolution
,
2011,
CoNLL Shared Task.
[9]
Claire Gardent,et al.
Improving Machine Learning Approaches to Coreference Resolution
,
2002,
ACL.