Twitter sentiment classification using machine learning techniques for stock markets

Sentiment classification of Twitter data has been successfully applied in finding predictions in a variety of domains. However, using sentiment classification to predict stock market variables is still challenging and ongoing research. The main objective of this study is to compare the overall accuracy of two machine learning techniques (logistic regression and neural network) with respect to providing a positive, negative and neutral sentiment for stock-related tweets. Both classifiers are compared using Bigram term frequency (TF) and Unigram term frequency - inverse document term frequency (TF-IDF) weighting schemes. Classifiers are trained using a dataset that contains 42,000 automatically annotated tweets. The training dataset forms positive, negative and neutral tweets covering four technology-related stocks (Twitter, Google, Facebook, and Tesla) collected using Twitter Search API. Classifiers give the same results in terms of overall accuracy (58%). However, empirical experiments show that using Unigram TF-IDF outperforms TF.

[1]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[2]  Barrie Sosinsky,et al.  Cloud Computing Bible , 2010 .

[3]  E. Fama The Behavior of Stock-Market Prices , 1965 .

[4]  P. Mell,et al.  The NIST Definition of Cloud Computing , 2011 .

[5]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[6]  Burton G. Malkiel,et al.  A random walk down Wall Street : including a life-cycle guide to personal investing , 1999 .

[7]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[8]  Marc-André Mittermayer,et al.  Forecasting Intraday stock price trends with text mining techniques , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[9]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[10]  Devendra Tayal,et al.  Comparative Analysis of the Impact of Blogging and Micro-blogging on Market Performance , 2009 .

[11]  Olivia Sheng,et al.  Investigating Predictive Power of Stock Micro Blog Sentiment in Forecasting Future Stock Price Directional Movement , 2011, ICIS.

[12]  Lucila Ohno-Machado,et al.  Logistic regression and artificial neural network classification models: a methodology review , 2002, J. Biomed. Informatics.

[13]  Christian Callegari,et al.  Advances in Computing, Communications and Informatics (ICACCI) , 2015 .

[14]  Navneet Kaur,et al.  Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[15]  Barrie Sosinsky Cloud Computing Bible: Sosinsky/Cloud , 2010 .

[16]  Li Bing,et al.  Public Sentiment Analysis in Twitter Data for Prediction of a Company's Stock Price Movements , 2014, 2014 IEEE 11th International Conference on e-Business Engineering.

[17]  Alan F. Smeaton,et al.  Classifying sentiment in microblogs: is brevity an advantage? , 2010, CIKM.

[18]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[19]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.