Discovering public sentiment in social media for predicting stock movement of publicly listed companies

The popularity of many social media sites has prompted both academic and practical research on the possibility of mining social media data for the analysis of public sentiment. Studies have suggested that public emotions shown through Twitter could be well correlated with the Dow Jones Industrial Average. However, it remains unclear how public sentiment, as reflected on social media, can be used to predict stock price movement of a particular publicly-listed company. In this study, we attempt to fill this research void by proposing a technique, called SMeDA-SA, to mine Twitter data for sentiment analysis and then predict the stock movement of specific listed companies. For the purpose of experimentation, we collected 200 million tweets that mentioned one or more of 30 companies that were listed in NASDAQ or the New York Stock Exchange. SMeDA-SA performs its task by first extracting ambiguous textual messages from these tweets to create a list of words that reflects public sentiment. SMeDA-SA then made use of a data mining algorithm to expand the word list by adding emotional phrases so as to better classify sentiments in the tweets. With SMeDA-SA, we discover that the stock movement of many companies can be predicted rather accurately with an average accuracy over 70%. This paper describes how SMeDA-SA can be used to mine social media date for sentiments. It also presents the key implications of our study. Combined with both direct and indirect division operations, our proposed algorithm has achieved better predict accuracy compared with other existed direct social media mining method through the experiment results.The proposed algorithms have a better prediction performance in some certain industries such as IT and media.Our study indicates the proposed algorithms have a better performance in using current tweets sentiment to predict the stock price of three days later.

[1]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[2]  Luvai Motiwalla,et al.  Predictable variation and profitable trading of US equities: a trading simulation using neural networks , 2000, Comput. Oper. Res..

[3]  Alan R. Dennis,et al.  Trading on Twitter: The Financial Information Content of Emotion in Social Media , 2014, 2014 47th Hawaii International Conference on System Sciences.

[4]  Bernard J. Jansen,et al.  Twitter power: Tweets as electronic word of mouth , 2009 .

[5]  E. Fama,et al.  Efficient Capital Markets : II , 2007 .

[6]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[7]  Bernardo A. Huberman,et al.  Predicting the Future , 2003, Inf. Syst. Frontiers.

[9]  Andrew K. C. Wong,et al.  Learning sequential patterns for probabilistic inductive prediction , 1994 .

[10]  Steven Skiena,et al.  Improving Movie Gross Prediction through News Analysis , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[11]  Olivia Sheng,et al.  Investigating Predictive Power of Stock Micro Blog Sentiment in Forecasting Future Stock Price Directional Movement , 2011, ICIS.

[12]  Andrew Trotman,et al.  Sound and complete relevance assessment for XML retrieval , 2008, TOIS.

[13]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[14]  Younghoon Kim,et al.  TWILITE: A recommendation system for Twitter using a probabilistic model based on latent Dirichlet allocation , 2014, Inf. Syst..

[15]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[16]  Cheng Long,et al.  Viral marketing for dedicated customers , 2014, Inf. Syst..

[17]  Ling Liu,et al.  A social-media-based approach to predicting stock comovement , 2015, Expert Syst. Appl..

[18]  Kiyoaki Shirai,et al.  Topic Modeling based Sentiment Analysis on Social Media for Stock Market Prediction , 2015, ACL.

[19]  Noah A. Smith,et al.  Movie Reviews and Revenues: An Experiment in Text Regression , 2010, NAACL.

[20]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[21]  Ramesh Sharda,et al.  Predicting box-office success of motion pictures with neural networks , 2006 .

[22]  Hsinchun Chen,et al.  Textual analysis of stock market prediction using breaking financial news: The AZFin text system , 2009, TOIS.

[23]  Andrea Esuli,et al.  SentiWordNet: A High-Coverage Lexical Resource for Opinion Mining , 2006 .

[24]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[25]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[26]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[27]  E. Fama The Behavior of Stock-Market Prices , 1965 .

[28]  Paulo Cortez,et al.  On the Predictability of Stock Market Behavior Using StockTwits Sentiment and Posting Volume , 2013, EPIA.

[29]  David M. Pennock,et al.  The Real Power of Artificial Markets , 2001, Science.

[30]  Ramanathan V. Guha,et al.  The predictive power of online chatter , 2005, KDD '05.

[31]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, Web Intelligence.