IMS HotCoref DE: A Data-driven Co-reference Resolver for German

This paper presents a data-driven co-reference resolution system for German that has been adapted from IMS HotCoref, a co-reference resolver for English. It describes the difficulties when resolving co-reference in German text, the adaptation process and the features designed to address linguistic challenges brought forth by German. We report performance on the reference dataset TuBa-D/Z and include a post-task SemEval 2010 evaluation, showing that the resolver achieves state-of-the-art performance. We also include ablation experiments that indicate that integrating linguistic features increases results. The paper also describes the steps and the format necessary to use the resolver on new texts. The tool is freely available for download.

[1]  Manaal Faruqui,et al.  Training and Evaluating a German Named Entity Recognizer with Semantic Generalization , 2010, KONVENS.

[2]  Yannick Versley,et al.  BART: A Multilingual Anaphora Resolution System , 2010, *SEMEVAL.

[3]  Markus Krug,et al.  Rule-based Coreference Resolution in German Historic Novels , 2015, CLfL@NAACL-HLT.

[4]  Hinrich Schütze,et al.  SUCRE: A Modular System for Coreference Resolution , 2010, *SEMEVAL.

[5]  Sandra Kübler,et al.  UBIU: A Language-Independent System for Coreference Resolution , 2010, *SEMEVAL.

[6]  Yannick Versley,et al.  Extending BART to Provide a Coreference Resolution System for German , 2010, LREC.

[7]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[8]  Maria Simi,et al.  TANL-1: Coreference Resolution by Parse Analysis and Similarity Clustering , 2010, *SEMEVAL.

[9]  Don Tuggener,et al.  A Hybrid Entity-Mention Pronoun Resolution Model for German Using Markov Logic Networks , 2014, KONVENS.

[10]  Georgiana Dinu,et al.  DISSECT - DIStributional SEmantics Composition Toolkit , 2013, ACL.

[11]  Yannick Versley,et al.  SemEval-2010 Task 1: Coreference Resolution in Multiple Languages , 2009, *SEMEVAL.

[12]  Dan Klein,et al.  A Joint Model for Entity Analysis: Coreference, Typing, and Linking , 2014, TACL.

[13]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[14]  Vincent Ng,et al.  Supervised Noun Phrase Coreference Research: The First Fifteen Years , 2010, ACL.

[15]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[16]  Don Tuggener,et al.  An Incremental Entity-Mention Model for Coreference Resolution with Restrictive Antecedent Accessibility , 2011, RANLP.

[17]  Dekang Lin,et al.  Bootstrapping Path-Based Pronoun Resolution , 2006, ACL.

[18]  Fabienne Cap Morphological processing of compounds for statistical machine translation , 2014 .

[19]  Jonas Kuhn,et al.  Learning Structured Perceptrons for Coreference Resolution with Latent Antecedents and Non-local Features , 2014, ACL.

[20]  Joakim Nivre,et al.  Feature Description for the Transition-Based Parser for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing , 2012 .

[21]  Yuchen Zhang,et al.  CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes , 2012, EMNLP-CoNLL Shared Task.

[22]  Gertrud Faaß,et al.  SdeWaC - A Corpus of Parsable Sentences from the Web , 2013, GSCL.

[23]  Nianwen Xue,et al.  CoNLL-2011 Shared Task: Modeling Unrestricted Coreference in OntoNotes , 2011, CoNLL Shared Task.

[24]  Christopher D. Manning,et al.  Entity-Centric Coreference Resolution with Model Stacking , 2015, ACL.