The Extended DIRNDL Corpus as a Resource for Coreference and Bridging Resolution

DIRNDL is a spoken and written corpus based on German radio news, which features coreference and information-status annotation (including bridging anaphora and their antecedents), as well as prosodic information. We have recently extended DIRNDL with a fine-grained two-dimensional information status labeling scheme. We have also applied a state-of-the-art part-of-speech and morphology tagger to the corpus, as well as highly accurate constituency and dependency parsers. In the light of this development we believe that DIRNDL is an interesting resource for NLP researchers working on automatic coreference and bridging resolution. In order to enable and promote usage of the data, we make it available for download in an accessible tabular format, compatible with the formats used in the CoNLL and SemEval shared tasks on automatic coreference resolution.

[1]  Herbert H. Clark,et al.  Bridging , 1975, TINLAP.

[2]  Arndt Riester,et al.  Anarchy in the NP. When new nouns get deaccented and given nouns don’t , 2015 .

[3]  Roger Schwarzschild,et al.  GIVENNESS, AVOIDF AND OTHER CONSTRAINTS ON THE PLACEMENT OF ACCENT* , 1999 .

[4]  Bernd Bohnet,et al.  Top Accuracy and Fast Dependency Parsing is not a Contradiction , 2010, COLING.

[5]  Manaal Faruqui,et al.  Training and Evaluating a German Named Entity Recognizer with Semantic Generalization , 2010, KONVENS.

[6]  Andrew Rosenberg,et al.  Cross-Language Prominence Detection , 2012 .

[7]  Jonas Kuhn,et al.  Making Ellipses Explicit in Dependency Conversion for a German Treebank , 2012, LREC.

[8]  Stefan Baumann,et al.  Focus Triggers and Focus Types from a Corpus Perspective , 2013, Dialogue Discourse.

[9]  M. Halliday NOTES ON TRANSITIVITY AND THEME IN ENGLISH. PART 2 , 1967 .

[10]  Yannick Versley,et al.  SemEval-2010 Task 1: Coreference Resolution in Multiple Languages , 2009, *SEMEVAL.

[11]  F. Tomaschek,et al.  Segmental effects on prosody : Modelling German argument structure , 2014 .

[12]  CARLA UMBACH,et al.  (DE)ACCENTING DEFINITE DESCRIPTIONS , 2001 .

[13]  Julia Hirschberg,et al.  Cross-language phrase boundary detection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Gregor Mohler Improvements of the PaIntE model for F_0 parametrization , 2001 .

[15]  Sabine Brants,et al.  The TIGER Treebank , 2001 .

[16]  Christian Rohrer,et al.  Improving coverage and parsing quality of a large-scale LFG for German , 2006, LREC.

[17]  Mark Steedman,et al.  An Annotation Scheme for Information Status in Dialogue , 2004, LREC.

[18]  Jörg Mayer,et al.  TRANSCRIPTION OF GERMAN INTONATION THE STUTTGART SYSTEM , 2007 .

[19]  C. Reiss,et al.  The Oxford Handbook of Linguistic Interfaces , 2007 .

[20]  Nianwen Xue,et al.  CoNLL-2011 Shared Task: Modeling Unrestricted Coreference in OntoNotes , 2011, CoNLL Shared Task.

[21]  Renata Vieira,et al.  A Corpus-based Investigation of Definite Description Use , 1997, CL.

[22]  Stefan Baumann,et al.  Referential and lexical givenness: Semantic, prosodic and cognitive aspects , 2012 .

[23]  Kerstin Eckart,et al.  A Discourse Information Radio News Database for Linguistic Analysis , 2012, Linked Data in Linguistics.

[24]  J. Pierrehumbert The phonology and phonetics of English intonation , 1987 .

[25]  Nina Seemann,et al.  A Recursive Annotation Scheme for Referential Information Status , 2010, LREC.

[26]  Jörg Mayer,et al.  TRANSCRIPTION OF GERMAN INTONATION , 1995 .

[27]  Ellen F. Prince,et al.  Toward a taxonomy of given-new information , 1981 .

[28]  Wolfgang Seeker,et al.  (Re)ranking Meets Morphosyntax: State-of-the-art Results from the SPMRL 2013 Shared Task , 2013, SPMRL@EMNLP.

[29]  Stefan Baumann,et al.  Coreference, lexical givenness and prosody in German , 2013 .

[30]  Hinrich Schütze,et al.  Efficient Higher-Order CRFs for Morphological Tagging , 2013, EMNLP.

[31]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[32]  Nizar Habash,et al.  Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages , 2013, SPMRL@EMNLP.

[33]  Yuchen Zhang,et al.  CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes , 2012, EMNLP-CoNLL Shared Task.