SocialCube: A Text Cube Framework for Analyzing Social Media Data

The recent development of social media (e.g., Twitter, Facebook, blogs, etc.) provides an unprecedented opportunity to study human social cultural behaviors. These data sources provide rich structured data (e.g., XML, relational tables, and categorical data) as well as unstructured data (e.g., texts). A significant challenge is to summarize and navigate structured data together with unstructured text data for efficient query and analysis. In this paper we introduce a text cube architecture designed to organize social media data in multiple dimensions and hierarchies for efficient information query and visualization from multiple perspectives. For example, an affective process cube allows the analyst to examine public reaction (e.g., sadness, anger) to a range of social phenomena. The text cube architecture also supports the development of prediction models using the summarized statistics stored in a data cube. For example, models that detect events, such as violent protests in the Egyptian Revolution, can be built using the linguistic features stored in an event data cube. These kinds of models represent higher level of knowledge representation and may help to develop more effective strategies for decision-making based on social media data.

[1]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[2]  Cindy K. Chung,et al.  Linguistic Inquiry and Word Count (LIWC): Pronounced “Luke,” . . . and Other Useful Facts , 2012 .

[3]  Adrienne Y. Lee,et al.  Language of lies in prison: linguistic classification of prisoners' truthful and deceptive natural language , 2005 .

[4]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[5]  Kaizhi Tang,et al.  An agent-based framework for collaborative data mining optimization , 2010, 2010 International Symposium on Collaborative Technologies and Systems.

[6]  Jeffrey T. Hancock,et al.  Expressing emotion in text-based communication , 2007, CHI.

[7]  Jeffrey T. Hancock,et al.  On Lying and Being Lied To: A Linguistic Analysis of Deception in Computer-Mediated Communication , 2007 .

[8]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[9]  Kristina Lerman,et al.  Information Contagion: An Empirical Study of the Spread of News on Digg and Twitter Social Networks , 2010, ICWSM.

[10]  Christopher Brown Evolution of Sentiment in the Libyan Revolution , 2011 .

[11]  Bo Zhao,et al.  TEXplorer: keyword-based object search and exploration in multidimensional text databases , 2011, CIKM '11.

[12]  P. Ekman Telling lies: clues to deceit in the marketplace , 1985 .

[13]  J. Pennebaker,et al.  Lying Words: Predicting Deception from Linguistic Styles , 2003, Personality & social psychology bulletin.

[14]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[15]  Kaizhi Tang,et al.  ABMiner: A Scalable Data Mining Framework to Support Human Performance Analysis , 2010 .

[16]  Andreas Tolk,et al.  Emerging M&S challenges for human, social, cultural, and behavioral modeling , 2009 .

[17]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[18]  Bo Zhao,et al.  Text Cube: Computing IR Measures for Multidimensional Text Database Analysis , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[19]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[20]  Jimmy Lin,et al.  Full-text indexing for optimizing selection operations in large-scale data analytics , 2011, MapReduce '11.

[21]  P. Keila,et al.  Detecting Unusual and Deceptive Communication in Email , 2005 .

[22]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[23]  Arthur C. Graesser,et al.  Coh-Metrix: Analysis of text on cohesion and language , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[24]  Xiong Liu Exploring Linguistic Features for Deception Detection in Unstructured Text , 2012 .

[25]  Jeffrey T. Hancock,et al.  I'm sad you're sad: emotional contagion in CMC , 2008, CSCW.

[26]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[27]  Jiawei Han,et al.  Topic Cube: Topic Modeling for OLAP on Multidimensional Text Databases , 2009, SDM.

[28]  ChengXiang Zhai,et al.  MiTexCube: MicroTextCluster Cube for online analysis of text cells and its applications , 2011, CIDU.

[29]  Ian Witten,et al.  Data Mining , 2000 .

[30]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[31]  J. Burgoon,et al.  Nonverbal Communication , 2018, Encyclopedia of Evolutionary Psychological Science.

[32]  J. Russell A circumplex model of affect. , 1980 .