Recently we witnessed that the social network analysis focusing on social entities is applied in the social science and web-science, behavioral sciences, as well as in economics, marketing. In this paper we present one method to construct the social network from literary fictions by a simple lexical analysis, not using the complex natural language processing tools. And we will show that those social graphs, saying literary social graph, shows the power law distribution of some features, which is the typical characteristics of complex systems. We showed that the social network extracted from literary data reflects the similar network structure which was semantically designed by authors of fictions. And we newly proposed the concept of the kernel of literary social network by which we can classify the abstract level of protagonists appeared in fictions. Our study shows that the metric distance among characters written in linear text is very similar to the intrinsic and semantic relationship described by fiction writers, which implies the proposed social network from fictions could be another representation of literary fiction. So we can apply other scientific and quantitative approach by analyzing the concrete social graph model extracted from textual data. Extracting useful information from a large textual repository is getting essential in data mining field. One difficulty in this work is how to deal with the various kinds of natural languages. Most work has based on English based texts, but recently some other languages including East-Asian languages have shown interesting result. Due to the recent prevailing SNS (Social Network Service), people try to extract the sentiment from text data shared in on-line users. After obtaining attitude from one individual, we are able to identify the connection structure of elements in a community. The main issues of these sentiment analysis is how to identify the polarity of adjectives based on conjunctions linking them in a large corpus. And another hot research area is mining information over online discussion by observing discussion threads. By mining used words or related replying patterns among discussion thread, we can identify the friendly group or conflicting group. Our basic idea is that we can regard the complicated text (e.g. long literary fictions) as the typical complex system
[1]
김상락.
complex network analysis in literature: togi
,
2005
.
[2]
Ido Dagan,et al.
Similarity-Based Models of Word Cooccurrence Probabilities
,
1998,
Machine Learning.
[3]
Kathleen McKeown,et al.
Extracting Social Networks from Literary Fiction
,
2010,
ACL.
[4]
Dragomir R. Radev,et al.
Extracting Signed Social Networks from Text
,
2012,
TextGraphs@ACL.
[5]
Kyu-Baek Hwang,et al.
Keyphrase extraction in biomedical publications using mesh and intraphrase word co-occurrence information
,
2011,
DTMBIO '11.
[6]
Mike Thelwall,et al.
Word statistics in Blogs and RSS feeds: Towards empirical universal evidence
,
2007,
J. Informetrics.
[7]
Jaeul Ku,et al.
Analysis of Network Dynamics from the Romance of the Three Kingdoms
,
2009
.
[8]
James Stiller,et al.
The small world of shakespeare’s plays
,
2003,
Human nature.
[9]
Animesh Mukherjee,et al.
Global topology of word co-occurrence networks: Beyond the two-regime power-law
,
2010,
COLING.
[10]
Takayuki Ito,et al.
Filtering harmful sentences based on three-word co-occurrence
,
2011,
CEAS '11.
[11]
Ling Chen,et al.
Using Co-occurence of Tags and Resources to Identify Spammers
,
2008
.
[12]
Jeffrey A. Rydberg-Cox.
Social Networks and the Language of Greek Tragedy
,
2011
.
[13]
G. J. Rodgers,et al.
Modelling hierarchical and modular complex networks: division and independence
,
2005
.
[14]
A.N. Zincir-Heywood,et al.
Combining word based and word co-occurrence based sequence analysis for text categorization
,
2004,
Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).
[15]
최연무.
Greek Myth as a Complex Network
,
2004
.
[16]
Ajay Mehra.
The Development of Social Network Analysis: A Study in the Sociology of Science
,
2005
.