Corpus REDEWIEDERGABE

This article presents corpus REDEWIEDERGABE, a German-language historical corpus with detailed annotations for speech, thought and writing representation (ST&WR). With approximately 490,000 tokens, it is the largest resource of its kind. It can be used to answer literary and linguistic research questions and serve as training material for machine learning. This paper describes the composition of the corpus and the annotation structure, discusses some methodological decisions and gives basic statistics about the forms of ST&WR found in this corpus.

[1]  Ralf Krestel,et al.  Minding the Source: Automatic Tagging of Reported Speech in Newspaper Articles , 2008, LREC.

[2]  Annelen Brunner,et al.  Automatische Erkennung von Redewiedergabe: ein Beitrag zur quantitativen Narratologie , 2015 .

[3]  M CHASTAING,et al.  [Style in fiction]. , 1951, Journal de psychologie normale et pathologique.

[4]  Peng Bi,et al.  Handbook of Linguistic Annotation , 2018, J. Quant. Linguistics.

[5]  H. Weinrich Textgrammatik der deutschen Sprache. , 2002 .

[6]  J. Lavid,et al.  Towards a ‘Science’ of Corpus Annotation: A New Methodological Challenge for Corpus Linguistics , 2013 .

[7]  Patrick Watrin,et al.  Extraction of unmarked quotations in Newspapers , 2012, LREC.

[8]  M. Fludernik,et al.  The fictions of language and the languages of fiction: The linguistic representation of speech and consciousness , 1995 .

[9]  Emer O’Sullivan Einführung in die Erzähltheorie , 2001 .

[10]  郭健生 Style in Fiction:A Linguistic Introduction to English Fictional Prose , 1983 .

[11]  Helmut Schmid,et al.  Estimation of Conditional Probabilities With Decision Trees and an Application to Fine-Grained POS Tagging , 2008, COLING.

[12]  Fotis Jannidis,et al.  Deep learning for Free Indirect Representation , 2019, KONVENS.

[13]  Stefan Engelberg Quantitative Verteilungen im Wortschatz. Zu lexikologischen und lexikografischen Aspekten eines dynamischen Lexikons , 2015 .

[14]  Evelyn Gius,et al.  The Hermeneutic Profit of Annotation: On Preventing and Fostering Disagreement in Literary Analysis , 2017, Int. J. Humanit. Arts Comput..

[15]  Ann Banfield,et al.  Unspeakable Sentences : Narration and Representation in the Language of Fiction , 1982 .

[16]  John Lee,et al.  An Annotated Corpus of Direct Speech , 2016, LREC.

[17]  P. Eisenberg Grundriss der deutschen Grammatik , 2006 .

[18]  Bryan Jurish,et al.  Finite-state canonicalization techniques for historical German , 2011 .

[19]  Gisela Zifonun,et al.  Grammatik der deutschen Sprache , 1997 .

[20]  Meir Sternberg,et al.  Proteus in Quotation-Land: Mimesis and the Forms of Reported Discourse , 1982 .

[21]  Elena Semino,et al.  Corpus Stylistics: Speech, Writing and Thought Presentation in a Corpus of English Writing , 2004 .

[22]  Kathleen McKeown,et al.  Automatic Attribution of Quoted Speech in Literary Narrative , 2010, AAAI.

[23]  Geoffrey Leech,et al.  Style in fiction , 1981 .

[24]  F. Stanzel Theorie des Erzahlens , 2008 .

[25]  Patrick Watrin,et al.  Extraction of unmarked quotations in Newspapers - A study based on direct speech extraction systems , 2012, LREC 2012.

[26]  Peter Eisenberg,et al.  Grundriß der deutschen Grammatik : Bd. 2 Der Satz , 2013 .

[27]  W. Riggan,et al.  Transparent Minds: Narrative Modes for Presenting Consciousness in Fiction , 1978 .