论文信息 - Automatic recognition of speech, thought, and writing representation in German narrative texts

Automatic recognition of speech, thought, and writing representation in German narrative texts

This article presents the main results of a project, which explored ways to recognize and classify a narrative feature—speech, thought, and writing representation (ST&WR)—automatically, using surface information and methods of computational linguistics. The task was to detect and distinguish four types—direct, free indirect, indirect, and reported ST&WR—in a corpus of manually annotated German narrative texts. Rule-based as well as machine-learning methods were tested and compared. The results were best for recognizing direct ST&WR (best F1 score: 0.87), followed by indirect (0.71), reported (0.58), and finally free indirect ST&WR (0.40). The rule-based approach worked best for ST&WR types with clear patterns, like indirect and marked direct ST&WR, and often gave the most accurate results. Machine learning was most successful for types without clear indicators, like free indirect ST&WR, and proved more stable. When looking at the percentage of ST&WR in a text, the results of machine-learning methods always correlated best with the results of manual annotation. Creating a union or intersection of the results of the two approaches did not lead to striking improvements. A stricter definition of ST&WR, which excluded borderline cases, made the task harder and led to worse results for both approaches.

Annelen Brunner | Ann Brunner

[1] Helmut Schmid,et al. Estimation of Conditional Probabilities With Decision Trees and an Application to Fine-Grained POS Tagging , 2008, COLING.

[2] Luís Sarmento,et al. Automatic extraction of quotes and topics from news feeds , 2009 .

[3] Geoffrey Leech,et al. Style in Fiction: A Linguistic Introduction to English Fictional Prose , 1982 .

[4] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[5] Kathleen McKeown,et al. Automatic Attribution of Quoted Speech in Literary Narrative , 2010, AAAI.

[6] G. Genette,et al. Narrative Discourse, an Essay in Method. , 1980 .

[7] Rich Caruana,et al. An empirical evaluation of supervised learning in high dimensions , 2008, ICML '08.

[8] Nuno J. Mamede,et al. Character Identification in Children Stories , 2004, EsTAL.

[9] G. Genette,et al. Narrative discourse : an essay in method , 1980 .

[10] R. Polikar,et al. Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[11] Geoffrey Leech,et al. Style in fiction , 1981 .

[12] Pedro M. Domingos. MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[13] Helmut Schmid,et al. Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[14] Andy Liaw,et al. Classification and Regression by randomForest , 2007 .

[15] Ralf Krestel,et al. Minding the Source: Automatic Tagging of Reported Speech in Newspaper Articles , 2008, LREC.

[16] Heike Neuroth,et al. TextGrid - Virtual Research Environment for the Humanities , 2011, Int. J. Digit. Curation.