Different German and English Coreference Resolution Models for Multi-domain Content Curation Scenarios

Coreference Resolution is the process of identifying all words and phrases in a text that refer to the same entity. It has proven to be a useful intermediary step for a number of natural language processing applications. In this paper, we describe three implementations for performing coreference resolution: rule-based, statistical, and projection-based (from English to German). After a comparative evaluation on benchmark datasets, we conclude with an application of these systems on German and English texts from different scenarios in digital curation such as an archive of personal letters, excerpts from a museum exhibition, and regional news articles.

[1]  Breck Baldwin,et al.  Algorithms for Scoring Coreference Chains , 1998 .

[2]  Dmitry Zelenko,et al.  Coreference Resolution for Information Extraction , 2004 .

[3]  Christopher D. Manning,et al.  Parsing Three German Treebanks: Lexicalized and Unlexicalized Baselines , 2008 .

[4]  Manfred Stede,et al.  Discourse Processing , 2011, NAACL.

[5]  Dan Klein,et al.  Error-Driven Analysis of Challenges in Coreference Resolution , 2013, EMNLP.

[6]  Heeyoung Lee,et al.  A Multi-Pass Sieve for Coreference Resolution , 2010, EMNLP.

[7]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[8]  Ralf Krestel,et al.  {Using Knowledge-poor Coreference Resolution for Text Summarization} , 2003 .

[9]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[10]  Yannick Versley,et al.  Extending BART to Provide a Coreference Resolution System for German , 2010, LREC.

[11]  Wendy W. Chapman,et al.  Coreference resolution: A review of general methodologies and applications in the clinical domain , 2011, J. Biomed. Informatics.

[12]  Yuchen Zhang,et al.  CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes , 2012, EMNLP-CoNLL Shared Task.

[13]  Sven Hartrumpf,et al.  Coreference Resolution for Questions and Answer Merging by Validation , 2007, CLEF.

[14]  Christopher D. Manning,et al.  Deep Reinforcement Learning for Mention-Ranking Coreference Models , 2016, EMNLP.

[15]  Yannick Versley,et al.  SemEval-2010 Task 1: Coreference Resolution in Multiple Languages , 2009, *SEMEVAL.

[16]  Heeyoung Lee,et al.  Deterministic Coreference Resolution Based on Entity-Centric, Precision-Ranked Rules , 2013, CL.

[17]  Christopher D. Manning,et al.  Entity-Centric Coreference Resolution with Model Stacking , 2015, ACL.

[18]  Mark A. Przybocki,et al.  The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation , 2004, LREC.

[19]  Christopher D. Manning,et al.  Improving Coreference Resolution by Learning Entity-Level Distributed Representations , 2016, ACL.

[20]  Arndt Riester,et al.  Using prosodic annotations to improve coreference resolution of spoken text , 2015, ACL.

[21]  Yannick Versley,et al.  Resolving Coreferent Bridging in German Newspaper Text , 2010 .

[22]  Yannick Versley,et al.  BART: A Modular Toolkit for Coreference Resolution , 2008, ACL.

[23]  Felix Sasaki,et al.  Towards a Platform for Curation Technologies: Enriching Text Collections with a Semantic-Web Layer , 2016, ESWC.

[24]  Lynette Hirschman,et al.  A Model-Theoretic Coreference Scoring Scheme , 1995, MUC.

[25]  Erhard W. Hinrichs,et al.  The Tüba-D/Z Treebank: Annotating German with a Context-Free Backbone , 2004, LREC.

[26]  Rico Sennrich,et al.  Anaphora Resolution with Real Preprocessing , 2010, IceTAL.

[27]  Markus Krug,et al.  Rule-based Coreference Resolution in German Historic Novels , 2015, CLfL@NAACL-HLT.

[28]  Don Tuggener,et al.  Incremental Coreference Resolution for German , 2016 .

[29]  Manfred Stede,et al.  Multi-source annotation projection of coreference chains: assessing strategies and testing opportunities , 2017 .

[30]  Heiko Paulheim,et al.  The Semantic Web: ESWC 2017 Satellite Events , 2017, Lecture Notes in Computer Science.

[31]  Ulrik Petersen Association for Computational Linguistics, /COLING 2004 Geneva, 20th International Conference on Computational Linguistics, August 23rd to 27th, 2004. Volume II. Proceeding , 2004 .