Detecting Scenes in Fiction: A new Segmentation Task

This paper introduces the novel task of scene segmentation on narrative texts and provides an annotated corpus, a discussion of the linguistic and narrative properties of the task and baseline experiments towards automatic solutions. A scene here is a segment of the text where time and discourse time are more or less equal, the narration focuses on one action and location and character constellations stay the same. The corpus we describe consists of German-language dime novels (550k tokens) that have been annotated in parallel, achieving an inter-annotator agreement of gamma = 0.7. Baseline experiments using BERT achieve an F1 score of 24%, showing that the task is very challenging. An automatic scene segmentation paves the way towards processing longer narrative texts like tales or novels by breaking them down into smaller, coherent and meaningful parts, which is an important stepping stone towards the reconstruction of plot in Computational Literary Studies but also can serve to improve tasks like coreference resolution.

[1]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[2]  Eduard H. Hovy,et al.  A Model of Coherence Based on Distributed Sentence Representation , 2014, EMNLP.

[3]  Yann Mathet,et al.  The Unified and Holistic Method Gamma (γ) for Inter-Annotator Agreement Measure and Alignment , 2015, CL.

[4]  Anna Kazantseva,et al.  Topical Segmentation: a Study of Human Performance and a New Measure of Quality , 2012, HLT-NAACL.

[5]  Marie-Laure Ryan,et al.  Computing Action: A Narratological Approach , 2003 .

[6]  Mark A. Finlayson,et al.  Detecting Subevents using Discourse and Narrative Features , 2019, ACL.

[7]  Chris Fournier,et al.  Evaluating Text Segmentation using Boundary Edit Distance , 2013, ACL.

[8]  Luigi Di Caro,et al.  Text Segmentation with Topic Modeling and Entity Coherence , 2016, HIS.

[9]  Raymond J. Mooney,et al.  Learning Statistical Scripts with LSTM Recurrent Neural Networks , 2016, AAAI.

[10]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[11]  Omer Levy,et al.  SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.

[12]  Anna Kazantseva,et al.  Hierarchical Topical Segmentation with Affinity Propagation , 2014, COLING.

[13]  Hideki Kozima,et al.  Similarity between Words Computed by Spreading Activation on an English Dictionary , 1993, EACL.

[14]  Sara Tonelli,et al.  Novel Event Detection and Classification for Historical Texts , 2019, Computational Linguistics.

[15]  P. Eisenberg Grundriss der deutschen Grammatik , 2006 .

[16]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[17]  Chris Biemann,et al.  TopicTiling: A Text Segmentation Algorithm based on LDA , 2012, ACL 2012.

[18]  A. Baer Narrative Discourse An Essay In Method , 2016 .

[19]  Freddy Y. Y. Choi Advances in domain independent linear text segmentation , 2000, ANLP.

[20]  Hideki Kozima,et al.  Segmenting Narrative Text into Coherent Scenes , 1994 .

[21]  L SidnerCandace,et al.  Attention, intentions, and the structure of discourse , 1986 .

[22]  Marti A. Hearst,et al.  A Critique and Improvement of an Evaluation Metric for Text Segmentation , 2002, CL.

[23]  Manfred Pinkal,et al.  Detecting Everyday Scenarios in Narrative Texts , 2019, Proceedings of the Second Workshop on Storytelling.

[24]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[25]  Luke S. Zettlemoyer,et al.  End-to-end Neural Coreference Resolution , 2017, EMNLP.

[26]  Daniel Jurafsky,et al.  Neural Net Models of Open-domain Discourse Coherence , 2016, EMNLP.

[27]  F. Puppe,et al.  Automatische Erkennung von Figuren in deutschsprachigen Romanen , 2015, DHd.

[28]  John D. Lafferty,et al.  Text Segmentation Using Exponential Models , 1997, EMNLP.

[29]  David Kauchak,et al.  Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in NLP, pages 32--39, , 2007 .

[30]  Detection of Scenes in Fiction , 2019 .

[31]  Zhe Gan,et al.  Discourse-Aware Neural Extractive Model for Text Summarization , 2019, ArXiv.

[32]  Fotis Jannidis,et al.  To BERT or not to BERT - Comparing Contextual Embeddings in a Deep Learning Architecture for the Automatic Recognition of four Types of Speech, Thought and Writing Representation , 2020, SwissText/KONVENS.

[33]  Jianshe Zhou,et al.  Paragraph Coherence Detection Model Based on Recurrent Neural Networks , 2019, ICSI.

[34]  Nils Reiter Towards Annotating Narrative Segments , 2015, LaTeCH@ACL.

[35]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[36]  David Bamman,et al.  Literary Event Detection , 2019, ACL.

[37]  M. Pshirkov,et al.  Weak microlensing effect and stability of pulsar time scale , 2006, astro-ph/0610681.

[38]  Joemon M. Jose,et al.  Text segmentation: A topic modeling perspective , 2011, Inf. Process. Manag..

[39]  Ming Zhou,et al.  HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization , 2019, ACL.