Mining Twitter data for causal links between tweets and real-world outcomes

Abstract The authors present an expert and intelligent system that (1) identifies influential term groups having causal relationships with real-world enterprise outcomes from Twitter data and (2) quantifies the appropriate time lags between identified influential term groups and enterprise outcomes. Existing expert and intelligent systems, which are defined as computer systems that imitate the ability of human decision making, could enable computers to identify the spread of Twitter users’ enterprise-related feedback automatically. However, existing expert and intelligent systems have limitations on automatically identifying the causal effects on enterprise outcomes. Identifying the causal effects on enterprise outcomes is important, because Twitter users’ feedback toward enterprise decisions may have real-world implications. The proposed expert and intelligent system can support decision makers’ decisions considering the real-world effects of identified Twitter users’ feedback on enterprise outcomes. In particular, (1) a co-occurrence network analysis model is exploited to discover term candidates for generating influential term groups that are combinations of enterprise-related terms, which potentially influence enterprise outcomes. (2) Time series models and (3) a Granger causality analysis model are then employed to identify influential term groups having causal relationships with enterprise outcomes with the appropriate time lags. Case studies involving a real-world internet video streaming and disc rental provider as well as an airline company are used to test the validity of the proposed expert and intelligent system for both predicting enterprise outcomes in a long period and predicting the effects of specific events on enterprise outcomes in a short period.

[1]  Marcus Herzog,et al.  Using Ontologies for Extracting Product Features from Web Pages , 2006, International Semantic Web Conference.

[2]  Juheng Zhang,et al.  Voluntary information disclosure on social media , 2015, Decis. Support Syst..

[3]  Hongchul Lee,et al.  Sentiment analysis of twitter audiences: Measuring the positive or negative influence of popular twitterers , 2012, J. Assoc. Inf. Sci. Technol..

[4]  Mary J. Culnan,et al.  How Large U.S. Companies Can Use Twitter and Other Social Media to Gain Business Value , 2010, MIS Q. Executive.

[5]  Matthew S. Gerber,et al.  Predicting crime using Twitter and kernel density estimation , 2014, Decis. Support Syst..

[6]  Avi Arampatzis,et al.  A comparative evaluation of pre-processing techniques and their interactions for twitter sentiment analysis , 2018, Expert Syst. Appl..

[7]  Kewei Hou,et al.  Market Frictions, Price Delay, and the Cross-Section of Expected Returns , 2003 .

[8]  Roberto Rosas-Romero,et al.  Forecasting of stock return prices with sparse representation of financial time series over redundant dictionaries , 2016, Expert Syst. Appl..

[9]  Guido Caldarelli,et al.  S 1 Appendix , 2016 .

[10]  V. K. Liew Which Lag Length Selection Criteria Should We Employ? , 2006 .

[11]  Nada Lavrac,et al.  Predictive Sentiment Analysis of Tweets: A Stock Market Application , 2013, CHI-KDD.

[12]  C. Granger Investigating Causal Relations by Econometric Models and Cross-Spectral Methods , 1969 .

[13]  Matthew L. Dering,et al.  Using Large-Scale Social Media Networks as a Scalable Sensing System for Modeling Real-Time Energy Utilization Patterns , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[14]  Feng Zhou,et al.  Latent Customer Needs Elicitation by Use Case Analogical Reasoning From Sentiment Analysis of Online Product Reviews , 2015 .

[15]  Julien Velcin,et al.  Sentiment analysis on social media for stock movement prediction , 2015, Expert Syst. Appl..

[16]  Douglas A. Ferguson,et al.  Using Twitter for Promotion and Branding: A Content Analysis of Local Television Twitter Sites , 2011 .

[17]  Conrad S. Tucker,et al.  A Bayesian Sampling Method for Product Feature Extraction From Large-Scale Textual Data , 2016 .

[18]  Weiguo Fan,et al.  What's buzzing in the blizzard of buzz? Automotive component isolation in social media postings , 2013, Decis. Support Syst..

[19]  A. Kaplan,et al.  Users of the world, unite! The challenges and opportunities of Social Media , 2010 .

[20]  M. Eichler Causal inference in time series analysis , 2012 .

[21]  Andrew B. Whinston,et al.  Whose and what chatter matters? The effect of tweets on movie sales , 2013, Decis. Support Syst..

[22]  B. Stringam,et al.  An Analysis of Word-of-Mouse Ratings and Guest Comments of Online Hotel Distribution Sites , 2010 .

[23]  Wei Wei,et al.  Correlating S&P 500 stocks with Twitter data , 2012, HotSocial '12.

[24]  David Zimbra,et al.  Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network , 2013, Expert Syst. Appl..

[25]  Dolores Añón Higón,et al.  The hasty wisdom of the mob: How market sentiment predicts stock market behavior , 2017, Expert Syst. Appl..

[26]  João Santos,et al.  Reputation analysis with a ranked sentiment-lexicon , 2014, SIGIR.

[27]  Mounia Lalmas,et al.  A survey on the use of relevance feedback for information access systems , 2003, The Knowledge Engineering Review.

[28]  Conrad S. Tucker,et al.  Automated Discovery of Lead Users and Latent Product Features by Mining Large Scale Social Media Networks , 2015 .

[29]  Christophe Croux,et al.  Consumer sentiment and consumer spending: decomposing the Granger causal relationship in the time domain , 2007 .

[30]  Weiguo Fan,et al.  Vehicle defect discovery from social media , 2012, Decis. Support Syst..

[31]  Ari Rappoport,et al.  Enhanced Sentiment Learning Using Twitter Hashtags and Smileys , 2010, COLING.

[32]  Paulo Cortez,et al.  The impact of microblogging data for stock market prediction: Using Twitter to predict returns, volatility, trading volume and survey sentiment indices , 2017 .

[33]  Danah Boyd,et al.  I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience , 2011, New Media Soc..

[34]  James D. Hamilton Time Series Analysis , 1994 .

[35]  Abeed Sarker,et al.  Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features , 2015, J. Am. Medical Informatics Assoc..

[36]  Prabhudev Konana,et al.  The time-varying nature of social media sentiments in modeling stock returns , 2017, Decis. Support Syst..

[37]  L. Fourt,et al.  Early Prediction of Market Success for New Grocery Products , 1960 .

[38]  Yong Jin,et al.  The power of the "like" button: The impact of social media on box office , 2017, Decis. Support Syst..

[39]  Michel Ballings,et al.  The added value of social media data in B2B customer acquisition systems: A real-life experiment , 2017, Decis. Support Syst..

[40]  Lin Lu,et al.  Predicting short-term stock prices using ensemble methods and online data sources , 2018, Expert Syst. Appl..

[41]  Sung Hoon Lim,et al.  An unsupervised machine learning model for discovering latent infectious diseases using social media data , 2017, J. Biomed. Informatics.

[42]  D. Thornton,et al.  Lag-Length Selection and Tests of Granger Causality between Money and Income , 1984 .

[43]  Peter Jackson,et al.  Introduction to expert systems , 1986 .

[44]  John Yearwood,et al.  Kernel-based features for predicting population health indices from geocoded social media data , 2017, Decis. Support Syst..

[45]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[46]  Gregoris Mentzas,et al.  Using Social Media to Predict Future Events with Agent-Based Markets , 2010, IEEE Intelligent Systems.

[47]  Johannes Fürnkranz,et al.  A Study Using $n$-gram Features for Text Categorization , 1998 .

[48]  Mike Thelwall,et al.  Sentiment in short strength detection informal text , 2010 .

[49]  J. Aldrich Correlations Genuine and Spurious in Pearson and Yule , 1995 .

[50]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[51]  Sameena Shah,et al.  Stock Prediction Using Event-Based Sentiment Analysis , 2013, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[52]  Thomas Ertl,et al.  ScatterBlogs2: Real-Time Monitoring of Microblog Messages through User-Guided Filtering , 2013, IEEE Transactions on Visualization and Computer Graphics.

[53]  Jianping Zeng,et al.  Emotion space model for classifying opinions in stock message board , 2016, Expert Syst. Appl..

[54]  M. Bruhn,et al.  Are social media replacing traditional media in terms of brand equity creation , 2012 .

[55]  Ling Liu,et al.  A social-media-based approach to predicting stock comovement , 2015, Expert Syst. Appl..

[56]  Conrad S. Tucker Fad or Here to Stay: Predicting Product Market Adoption and Longevity Using Large Scale, Social Media Data DETC2013-12661 , 2013 .

[57]  Teruo Higashino,et al.  Twitter user profiling based on text and community mining for market analysis , 2013, Knowl. Based Syst..

[58]  So Young Sohn,et al.  Global stock market investment strategies based on financial network indicators using machine learning techniques , 2019, Expert Syst. Appl..

[59]  Zhijun Yan,et al.  EXPRS: An extended pagerank method for product feature extraction from online consumer reviews , 2015, Inf. Manag..

[60]  Xiaotie Deng,et al.  Exploiting Topic based Twitter Sentiment for Stock Prediction , 2013, ACL.

[61]  Mohamed M. Mostafa,et al.  More than words: Social networks' text mining for consumer brand sentiments , 2013, Expert Syst. Appl..

[62]  Christopher J. Fox,et al.  A stop list for general text , 1989, SIGF.

[63]  P. Suppes A Probabilistic Theory Of Causality , 1970 .

[64]  Ying Wah Teh,et al.  Text mining for market prediction: A systematic review , 2014, Expert Syst. Appl..

[65]  P. K. Kannan,et al.  Customer-Driven Product Design Selection Using Web Based User-Generated Content , 2011, DAC 2011.

[66]  Hsin-Lu Chang,et al.  Will firm's marketing efforts on owned social media payoff? A quasi-experimental analysis of tourism products , 2018, Decis. Support Syst..

[67]  Frederick P. Rivara,et al.  Prevalence of Marijuana-Related Traffic on Twitter, 2012-2013: A Content Analysis , 2015, Cyberpsychology Behav. Soc. Netw..

[68]  Tong Bao,et al.  Why Amazon Uses Both the New York Times Best Seller List and Customer Reviews: An Empirical Study of Multiplier Effects on Product Sales from Multiple Earned Media , 2014, Decis. Support Syst..

[69]  I. Good A CAUSAL CALCULUS (I)* , 1961, The British Journal for the Philosophy of Science.

[70]  Matthew A. Russell,et al.  Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More , 2018 .

[71]  Yung-Ming Li,et al.  Creating social intelligence for product portfolio design , 2014, Decis. Support Syst..

[72]  Nuno Horta,et al.  Expert Systems With Applications , 2022 .

[73]  Conrad S. Tucker,et al.  Mitigating Online Product Rating Biases Through the Discovery of Optimistic, Pessimistic, and Realistic Reviewers , 2017 .

[74]  Mario Cataldi,et al.  Emerging topic detection on Twitter based on temporal and social terms evaluation , 2010, MDMKDD '10.

[75]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, Web Intelligence.