Network analysis of narrative content in large corpora

We present a methodology for the extraction of narrative information from a large corpus. The key idea is to transform the corpus into a network, formed by linking the key actors and objects of the narration, and then to analyse this network to extract information about their relations. By representing information into a single network it is possible to infer relations between these entities, including when they have never been mentioned together. We discuss various types of information that can be extracted by our method, various ways to validate the information extracted and two different application scenarios. Our methodology is very scalable, and addresses specific research needs in social sciences.

[1]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[2]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[3]  Masahiro Kimura,et al.  Learning to Predict Opinion Share in Social Networks , 2010, AAAI.

[4]  Jure Leskovec,et al.  Signed networks in social media , 2010, CHI.

[5]  LiuJiming,et al.  Community Mining from Signed Social Networks , 2007 .

[6]  Nello Cristianini,et al.  NOAM: news outlets analysis and monitoring system , 2011, SIGMOD '11.

[7]  Dragomir R. Radev,et al.  Extracting Signed Social Networks from Text , 2012, TextGraphs@ACL.

[8]  Paul J. Fitzpatrick,et al.  Statistical Societies in the United States in the Nineteenth Century , 1957 .

[9]  S. Shott,et al.  Nonparametric Statistics , 2018, The Encyclopedia of Archaeological Sciences.

[10]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[11]  Dragomir R. Radev,et al.  Book Review: Graph-Based Natural Language Processing and Information Retrieval by Rada Mihalcea and Dragomir Radev , 2011, CL.

[12]  Kalina Bontcheva,et al.  Shallow Methods for Named Entity Coreference Resolution , 2002 .

[13]  Kathleen McKeown,et al.  Extracting Social Networks from Literary Fiction , 2010, ACL.

[14]  F. Heider Attitudes and cognitive organization. , 1946, The Journal of psychology.

[15]  Nello Cristianini,et al.  Automatic Discovery of Patterns in Media Content , 2011, CPM.

[16]  Dunja Mladenic,et al.  Semantic Graphs Derived From Triplets with Application in Document Summarization , 2009, Informatica.

[17]  P. Doreian,et al.  A partitioning approach to structural balance , 1996 .

[18]  J. D. McCarthy,et al.  The use of newspaper data in the study of collective action , 2003 .

[19]  Sahin Albayrak,et al.  Spectral Analysis of Signed Graphs for Clustering, Prediction and Visualization , 2010, SDM.

[20]  E. B. Wilson Probable Inference, the Law of Succession, and Statistical Inference , 1927 .

[21]  R. Mitkov ANAPHORA RESOLUTION: THE STATE OF THE ART , 2007 .

[22]  Dekang Lin,et al.  Dependency-Based Evaluation of Minipar , 2003 .

[23]  Hamish Cunningham,et al.  GATE-a General Architecture for Text Engineering , 1996, COLING.

[24]  William J. Welch,et al.  Construction of Permutation Tests , 1990 .

[25]  Claire Cardie,et al.  OpinionFinder: A System for Subjectivity Analysis , 2005, HLT.

[26]  Marko Grobelnik,et al.  Question Answering Based on Semantic Graphs , 2009 .

[27]  Roberto Franzosi,et al.  The Press as a Source of Socio-Historical Data: Issues in the Methodology of Data Collection from Newspapers , 1987 .

[28]  Franco Moretti Network theory, plot analysis , 2011 .

[29]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[30]  Caroline Haythornthwaite,et al.  Automated Discovery and Analysis of Social Networks from Threaded Discussions , 2008 .

[31]  Graham K. Rand,et al.  Quantitative Applications in the Social Sciences , 1983 .

[32]  Malik Magdon-Ismail,et al.  Communities and Balance in Signed Networks: A Spectral Approach , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[33]  K. Sneppen,et al.  Specificity and Stability in Topology of Protein Networks , 2002, Science.

[34]  Owen Rambow,et al.  Social Network Analysis of Alice in Wonderland , 2012, CLfL@NAACL-HLT.

[35]  E. Ziegel Permutation, Parametric, and Bootstrap Tests of Hypotheses (3rd ed.) , 2005 .

[36]  B. Bollobás The evolution of random graphs , 1984 .

[37]  D. Mladení,et al.  TRIPLET EXTRACTION FROM SENTENCES , 2007 .

[38]  Jiming Liu,et al.  Community Mining from Signed Social Networks , 2007, IEEE Transactions on Knowledge and Data Engineering.

[39]  Ralph Kenna,et al.  Universal properties of mythological networks , 2012, ArXiv.

[40]  Hsinchun Chen,et al.  Social Media Analytics and Intelligence , 2010, IEEE Intell. Syst..

[41]  G. Fazio Political Radicalization in the Making: The Civil Rights Movement in Northern Ireland, 1968-1972 , 2013 .

[42]  悠太 菊池,et al.  大規模要約資源としてのNew York Times Annotated Corpus , 2015 .

[43]  Roberto Franzosi,et al.  Narrative as Data: Linguistic and Statistical Tools for the Quantitative Study of Historical Events , 1998, International Review of Social History.

[44]  W. W. Daniel Applied Nonparametric Statistics , 1979 .

[45]  Roberto Franzosi,et al.  Quantitative Narrative Analysis , 2009 .

[46]  Gang Wang,et al.  Crime data mining: a general framework and some examples , 2004, Computer.

[47]  Paola Velardi,et al.  TermExtractor: a Web Application to Learn the Common Terminology of Interest Groups and Research Communities , 2007 .

[48]  Dunja Mladeni,et al.  Learning Event Patterns from Text , 2011 .

[49]  Mathieu Bastian,et al.  Gephi: An Open Source Software for Exploring and Manipulating Networks , 2009, ICWSM.