NOVEL2GRAPH: Visual Summaries of Narrative Text Enhanced by Machine Learning

A machine learning approach to the creation of visual summaries for narrative text is presented. Standard natural language processing tools for named entities recognition are used together with a clustering algorithm to detect the characters of the novel and their aliases. The most relevant ones and their relations are evaluated on the basis of a simple statistical analysis. These characters are visually depicted as nodes of an undirected graph whose edges describe relations with other characters. Specialized sentiment analysis techniques based on sentence embedding decide the colours of characters/nodes and their relations/edges. Additional information about the characters (e.g., gender) and their relations (e.g., siblings or partnerships) are returned by binary classifiers and visually depicted in the graph. For those specialized tasks, small amounts of manually annotated data are sufficient to achieve good accuracy. Compared to analogous tools, the machine learning approach we present allows for a richer representation of texts of this kind. A case study to demonstrate this approach for a series of books is also reported.

[1]  Derek Ruths,et al.  Annotating Characters in Literary Corpora: A Scheme, the CHARLES Tool, and an Annotated Novel , 2016, LREC.

[2]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[3]  Nan Hua,et al.  Universal Sentence Encoder , 2018, ArXiv.

[4]  Micha Elsner,et al.  Character-based kernels for novelistic plot structure , 2012, EACL.

[5]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[8]  William A. Cunningham,et al.  Mapping emotions through time: how affective trajectories inform the language of emotion. , 2012, Emotion.

[9]  Ausif Mahmood,et al.  Deep Learning approach for sentiment analysis of short texts , 2017, 2017 3rd International Conference on Control, Automation and Robotics (ICCAR).

[10]  Khalid Saeed,et al.  Natural Language Processing: Speaker, Language, and Gender Identification with LSTM , 2018, ACSS.

[11]  Kevin Gimpel,et al.  Towards Universal Paraphrastic Sentence Embeddings , 2015, ICLR.

[12]  Eric T. Nalisnick,et al.  Character-to-Character Sentiment Analysis in Shakespeare's Plays , 2013, ACL.

[13]  Hal Daumé,et al.  Deep Unordered Composition Rivals Syntactic Methods for Text Classification , 2015, ACL.

[14]  Patrícia Augustin Jaques,et al.  An Analysis of Hierarchical Text Classification Using Word Embeddings , 2018, Inf. Sci..

[15]  Derya Birant,et al.  ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..

[16]  Derek Ruths,et al.  Mr. Bennet, his coachman, and the Archbishop walk into a bar but only one of them gets recognized: On The Difficulty of Detecting Characters in Literary Texts , 2015, EMNLP.

[17]  Erik Cambria,et al.  Deep Learning-Based Document Modeling for Personality Detection from Text , 2017, IEEE Intelligent Systems.

[18]  Richard Socher,et al.  Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[19]  Derek Ruths,et al.  Studying Literary Characters and Character Networks , 2017, DH.