Predictive Analytics with Big Social Data

Recent research in the field of computational social science have shown how data resulting from the widespread adoption and use of social media channels such as twitter can be used to predict outcomes such as movie revenues, election winners, localized moods, and epidemic outbreaks. Underlying assumptions for this research stream on predictive analytics are that social media actions such as tweeting, liking, commenting and rating are proxies for user/consumer’s attention to a particular object/product and that the shared digital artefact that is persistent can create social influence. In this paper, we demonstrate how social media data from twitter and facebook can be used to predict the quarterly sales of iPhones and revenues of H&M respectively. Based on a conceptual model of social data consisting of social graph (actors, actions, activities, and artefacts) and social text (topics, keywords, pronouns, and sentiments), we develop and evaluate linear regression models that transform (a) iPhone tweets into a prediction of the quarterly iPhone sales with an average error close to the established prediction models from investment banks (Lassen, Madsen, & Vatrapu, 2014)and (b) facebook likes into a prediction of the global revenue of the fast fashion company, H&M. We discuss the findings and conclude with implications for predictive analytics with big social data. Research Question Our basic premise is that social media actions can serve as proxies for user’s attention and as such have predictive power. Our central research question is: to what extent can big social data predict real-world outcomes such as sales and revenues? Related Work We deliberately limit the review of extant literature to empirical work that examined the relationship between social data measures (such as facebook posts/likes/comments/shares, and twitter tweets/re-tweets/mentions/polarity etc.) and real-world business outcomes (revenues, stock price etc.). There has been substantial research work (Bakshy, Simmons, Huffaker, Teng, & Adamic, 2010; Bollen & Mao, 2011; Dorr & Denton, 2009; Gavrilov, Anguelov, Indyk, & Motwani, 2000; Kharratzadeh & Coates, 2012; Mittermayer, 2004) in the direction of predicting the stock prices of the companies based on the analysis of content from the online media such as news items, web blogs, twitter feeds. For example, Gavrilov et al., (2000) applied data mining techniques on the stock information from various companies by clustering them according to their Standard and Poor (S&P) 500 index, whereas the content from the weblogs is used by Kharratzadeh & Coates (2012) to identify the underlying relationships between the companies to make predictions about the evolution of stock prices. The most notable paper in this regard is from Asur & Huberman (2010) who showed that social media feeds can be used as effective indicators of the real-world performance. In their work, they used analysis of hourly rate of tweets about movies, their re-tweets and sentiment polarity to accurately forecast the box-office revenues. In fact, their prediction of movie revenues based on the social data measures from twitter outperformed the leading market-based predictions of the Hollywood Stock Exchange. In terms of macro-societal relationships, a research study investigated whether the public mood as measured from large-scale collection of Twitter tweets can be correlated or even predictive of Dow Jones Industrial Average (DJIA) values has been explored by Bollen and Mao (2011).

[1]  Anne M. Denton,et al.  Establishing relationships among patterns in stock market data , 2009, Data Knowl. Eng..

[2]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, Web Intelligence.

[3]  Marc-André Mittermayer,et al.  Forecasting Intraday stock price trends with text mining techniques , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[4]  Piotr Indyk,et al.  Mining the stock market (extended abstract): which measure is best? , 2000, KDD '00.

[5]  Mark Coates,et al.  Weblog Analysis for Predicting Correlations in Stock Price Evolutions , 2012, ICWSM.

[6]  Ravikiran Vatrapu,et al.  Predicting iPhone Sales from iPhone Tweets , 2014, 2014 IEEE 18th International Enterprise Distributed Object Computing Conference.

[7]  Lada A. Adamic,et al.  The Social Dynamics of Economic Activity in a Virtual World , 2010, ICWSM.

[8]  Johan Bollen,et al.  Twitter Mood as a Stock Market Predictor , 2011, Computer.