A modular approach to learning Dutch co-reference

This paper presents the first machine learning approach to the resolution of co-referential relations between nominal constituents in Dutch. Based on the hypothesis that different types of information sources contribute to a correct resolution of different types (pronominal, proper noun and common noun) of co-referential links, we propose a modular approach in which a separate module is trained per NP type. We present a thorough comparison of two machine learning techniques, a lazy learner and an eager learning approach, trained on the modular tasks as well as on the undecomposed task. In addition, we show that by postprocessing the resulting co-reference chains by means of a string-edit distance correction mechanism, we can avoid some unlikely local chainings and thereby improve precision. Lacking comparative results for Dutch, we also report results on the English MUC-6 and MUC-7 data sets, which are widely used for evaluation.

[1]  Walter Daelemans,et al.  Combined Optimization of Feature Selection and Algorithm Parameter Interaction in Machine Learning of Language , 2003 .

[2]  Rada Mihalcea,et al.  Word sense disambiguation with pattern learning and automatic feature selection , 2002, Natural Language Engineering.

[3]  Scott Bennett,et al.  Evaluating Automated and Manual Acquisition of Anaphora Resolution Strategies , 1995, ACL.

[4]  Walter Daelemans,et al.  Memory-Based Named Entity Recognition using Unannotated Data , 2003, CoNLL.

[5]  Walter Daelemans,et al.  Evaluation of Machine Learning Methods for Natural Language Processing Tasks , 2002, LREC.

[6]  Michael Strube,et al.  The Influence of Minimum Edit Distance on Reference Resolution , 2002, EMNLP.

[7]  Jian Su,et al.  Coreference Resolution Using Competition Learning Approach , 2003, ACL.

[8]  Edith Bolling Anaphora Resolution , 2006 .

[9]  Lynette Hirschman,et al.  Appendix F: MUC-7 Coreference Task Definition (version 3.0) , 1998, MUC.

[10]  Walter Daelemans,et al.  Forgetting Exceptions is Harmful in Language Learning , 1998, Machine Learning.

[11]  S. Buchholz,et al.  Memory-Based Grammatical Relation Finding , 2002 .

[12]  Walter Daelemans,et al.  Reduction of Dutch Sentences for Automatic Subtitling , 2003, CLIN.

[13]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[14]  Walter Daelemans,et al.  GAMBL, genetic algorithm optimization of memory-based WSD , 2004, SENSEVAL@ACL.

[15]  Walter Daelemans,et al.  A Comparison of Two Different Approaches to Morphological Analysis of Dutch , 2004, SIGMORPHON@ACL.

[16]  Ruslan Mitkov,et al.  Robust Pronoun Resolution with Limited Knowledge , 1998, ACL.

[17]  Breck Baldwin,et al.  Description of the UPENN CAMP System as Used for Coreference , 1998, MUC.

[18]  Claire Cardie,et al.  Combining Sample Selection and Error-Driven Pruning for Machine Learning of Coreference Rules , 2002, EMNLP.

[19]  Malvina Nissim,et al.  Comparing Knowledge Sources for Nominal Anaphora Resolution , 2005, Computational Linguistics.

[20]  Claire Cardie,et al.  Noun Phrase Coreference as Clustering , 1999, EMNLP.

[21]  David Fisher,et al.  Description of the UMass system as used for MUC-6 , 1995, MUC.

[22]  Walter Daelemans,et al.  MBT : Memory Based Tagger, version 1.0, Reference Guide , 2002 .

[23]  Lynette Hirschman,et al.  A Model-Theoretic Coreference Scoring Scheme , 1995, MUC.

[24]  Wendy G. Lehnert,et al.  A trainable approach to coreference resolution for information extraction , 1996 .

[25]  Heng Ji,et al.  Using Semantic Relations to Refine Coreference Decisions , 2005, HLT.

[26]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[27]  Walter Daelemans,et al.  TiMBL: Tilburg Memory-Based Learner, version 2.0, Reference guide , 1998 .

[28]  Walter Daelemans,et al.  Parameter optimization for machine-learning of word sense disambiguation , 2002, Natural Language Engineering.

[29]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[30]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[31]  Veronique Hoste,et al.  Optimization issues in machine learning of coreference resolution , 2005 .