Text mining for market prediction: A systematic review

The quality of the interpretation of the sentiment in the online buzz in the social media and the online news can determine the predictability of financial markets and cause huge gains or losses. That is why a number of researchers have turned their full attention to the different aspects of this problem lately. However, there is no well-rounded theoretical and technical framework for approaching the problem to the best of our knowledge. We believe the existing lack of such clarity on the topic is due to its interdisciplinary nature that involves at its core both behavioral-economic topics as well as artificial intelligence. We dive deeper into the interdisciplinary nature and contribute to the formation of a clear frame of discussion. We review the related works that are about market prediction based on online-text-mining and produce a picture of the generic components that they all have. We, furthermore, compare each system with the rest and identify their main differentiating factors. Our comparative analysis of the systems expands onto the theoretical and technical foundations behind each. This work should help the research community to structure this emerging field and identify the exact aspects which require further research and are of special significance.

[1]  Valerio Potì,et al.  What Drives Currency Predictability? , 2013 .

[2]  Richard Roll,et al.  Recent Trends in Trading Activity and Market Quality , 2010 .

[3]  Shingo Mabu,et al.  Enhanced decision making mechanism of rule-based genetic network programming for creating stock trading signals , 2013, Expert Syst. Appl..

[4]  Nikola Gradojevic,et al.  Fuzzy logic, trading uncertainty and technical trading , 2013 .

[5]  A. Lo,et al.  Reconciling Efficient Markets with Behavioral Finance: The Adaptive Markets Hypothesis , 2005 .

[6]  E. Fama Random Walks in Stock Market Prices , 1965 .

[7]  Yang Yu,et al.  The impact of social and conventional media on firm equity value: A sentiment analysis approach , 2013, Decis. Support Syst..

[8]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[9]  Hsinchun Chen,et al.  AI and Opinion Mining , 2010, IEEE Intelligent Systems.

[10]  Dirk Neumann,et al.  Automated news reading: Stock price prediction based on financial news using context-capturing features , 2013, Decis. Support Syst..

[11]  Chenn-Jung Huang,et al.  Realization of a news dissemination agent based on weighted association rules and text mining techniques , 2010, Expert Syst. Appl..

[12]  John F. Tomer What is Behavioral Economics? , 2005 .

[13]  Gerhard Gossen,et al.  Evaluation of methods and techniques for language based sentiment analysis for dax 30 stock exchange - A first concept of a "LUGO" sentiment indicator , 2012 .

[14]  Fei Song,et al.  Feature Selection for Sentiment Analysis Based on Content and Syntax Models , 2011, Decis. Support Syst..

[15]  Gholam Ali Montazer,et al.  Design and implementation of fuzzy expert system for Tehran Stock Exchange portfolio recommendation , 2010, Expert Syst. Appl..

[16]  Li Li,et al.  Combining Lexical and Semantic Features for Short Text Classification , 2013, KES.

[17]  Leonidas Anastasakis,et al.  Exchange rate forecasting using a combined parametric and nonparametric self-organising modelling approach , 2009, Expert Syst. Appl..

[18]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[19]  Vincenzo Loia,et al.  A fuzzy-oriented sentic analysis to capture the human emotion in Web-based content , 2014, Knowl. Based Syst..

[20]  Tong Zhang,et al.  Fundamentals of Predictive Text Mining , 2010, Texts in Computer Science.

[21]  Marc-André Mittermayer,et al.  Forecasting Intraday stock price trends with text mining techniques , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[22]  Shian-Chang Huang,et al.  Chaos-based support vector regressions for exchange rate forecasting , 2010, Expert Syst. Appl..

[23]  Bruce J. Vanstone,et al.  Enhancing stockmarket trading performance with ANNs , 2010, Expert Syst. Appl..

[24]  Véronique Hoste,et al.  Emotion detection in suicide notes , 2013, Expert Syst. Appl..

[25]  Joel Hasbrouck,et al.  Low-latency trading $ , 2013 .

[26]  Han Tong Loh,et al.  Imbalanced text classification: A term weighting approach , 2009, Expert Syst. Appl..

[27]  Werner Antweiler,et al.  Is All that Talk Just Noise? The Information Content of Internet Stock Message Boards , 2001 .

[28]  Vladimir Pestov,et al.  Is the kk-NN classifier in high dimensions affected by the curse of dimensionality? , 2011, Comput. Math. Appl..

[29]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[30]  Johan Bollen,et al.  Twitter Mood as a Stock Market Predictor , 2011, Computer.

[31]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[32]  Ian Witten,et al.  Data Mining , 2000 .

[33]  Serkan Günal,et al.  The impact of preprocessing on text classification , 2014, Inf. Process. Manag..

[34]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[35]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[36]  David Zimbra,et al.  Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network , 2013, Expert Syst. Appl..

[37]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[38]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[39]  Qiong Wu,et al.  A two-stage framework for cross-domain sentiment classification , 2011, Expert Syst. Appl..

[40]  Jian Zhang,et al.  Daily stock market forecast from textual web data , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).

[41]  Iñaki Inza,et al.  Approaching Sentiment Analysis by using semi-supervised learning of multi-dimensional classifiers , 2012, Neurocomputing.

[42]  Pei-Chann Chang,et al.  Using a contextual entropy model to expand emotion words and their intensity for the sentiment classification of stock market news , 2013, Knowl. Based Syst..

[43]  Saman K. Halgamuge,et al.  Combining News and Technical Indicators in Daily Stock Price Trends Prediction , 2007, ISNN.

[44]  Efstathios Stamatatos,et al.  Syntactic N-grams as machine learning features for natural language processing , 2014, Expert Syst. Appl..

[45]  P. Weller,et al.  Quantifying Cognitive Biases in Analyst Earnings Forecasts , 2002 .

[46]  Nick Bassiliades,et al.  Ontology-based sentiment analysis of twitter posts , 2013, Expert Syst. Appl..

[47]  Jianfeng Shen,et al.  The Relationship between the Frequency of News Release and the Information Asymmetry: The Role of Uninformed Trading , 2013 .

[48]  Lipika Dey,et al.  Mining Financial News for Major Events and Their Impacts on the Market , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[49]  Avanidhar Subrahmanyam,et al.  Evidence on the Speed of Convergence to Market Efficiency , 2001 .

[50]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[51]  Christofer Toumazou,et al.  Improving prediction of exchange rates using Differential EMD , 2013, Expert Syst. Appl..

[52]  Georgios Sermpinis,et al.  Forecasting and trading the EUR/USD exchange rate with Gene Expression and Psi Sigma Neural Networks , 2012, Expert Syst. Appl..

[53]  Wouter van Atteveldt,et al.  Financial news and market panics in the age of high-frequency sentiment trading algorithms , 2013 .

[54]  Tomasz Piotr Wisniewski,et al.  Article in Press Journal of Economic Behavior & Organization the Role of Media in the Credit Crunch: the Case of the Banking Sector , 2022 .

[55]  Jonghun Park,et al.  Language independent semantic kernels for short-text classification , 2014, Expert Syst. Appl..

[56]  Wai Lam,et al.  Stock prediction: Integrating text mining approach using real-time news , 2003, 2003 IEEE International Conference on Computational Intelligence for Financial Engineering, 2003. Proceedings..

[57]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[58]  E. Fama EFFICIENT CAPITAL MARKETS: A REVIEW OF THEORY AND EMPIRICAL WORK* , 1970 .

[59]  Hui Xiong,et al.  A semantic term weighting scheme for text categorization , 2011, Expert Syst. Appl..

[60]  Vlado Keselj,et al.  Financial Forecasting Using Character N-Gram Analysis and Readability Scores of Annual Reports , 2009, Canadian Conference on AI.

[61]  Abir Jaafar Hussain,et al.  Dynamic Ridge Polynomial Neural Network: Forecasting the univariate non-stationary and stationary trading signals , 2011, Expert Syst. Appl..

[62]  Naren Ramakrishnan,et al.  Forex-foreteller: currency trend modeling using news articles , 2013, KDD.

[63]  P. Kaltwasser Uncertainty about fundamentals and herding behavior in the FOREX market , 2010 .

[64]  Rafael Valencia-García,et al.  Financial news semantic search engine , 2011, Expert Syst. Appl..

[65]  Hua Xu,et al.  Text-based emotion classification using emotion cause extraction , 2014, Expert Syst. Appl..

[66]  Mike Y. Chen,et al.  Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web , 2001 .

[67]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[68]  Hsinchun Chen,et al.  Evaluating sentiment in financial news articles , 2012, Decis. Support Syst..

[69]  Shlomo Geva,et al.  What Types of Events Provide the Strongest Evidence that the Stock Market is Affected by Company Specific News? , 2006, AusDM.

[70]  Abraham Kandel,et al.  ADMIRAL: A Data Mining Based Financial Trading System , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[71]  Steven Salzberg,et al.  On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[72]  Debasish Majumder,et al.  Towards an efficient stock market: Empirical evidence from the Indian market , 2013 .

[73]  Phayung Meesad,et al.  A critical assessment of imbalanced class distribution problem: The case of predicting freshmen student attrition , 2014, Expert Syst. Appl..

[74]  Hiroshi Kanayama,et al.  Textual Demand Analysis: Detection of Users' Wants and Needs from Opinions , 2008, COLING.

[75]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[76]  Nasser Ghasem-Aghaee,et al.  Text feature selection using ant colony optimization , 2009, Expert Syst. Appl..

[77]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[78]  Jan Muntermann,et al.  An intraday market risk management approach based on textual analysis , 2011, Decis. Support Syst..

[79]  L. Smales,et al.  Order Imbalance, Market Returns and Macroeconomic News: Evidence from the Australian Interest Rate Futures Market , 2011 .

[80]  Ekrem Duman,et al.  Comparing alternative classifiers for database marketing: The case of imbalanced datasets , 2012, Expert Syst. Appl..

[81]  Xuehua Wang,et al.  Feature selection for high-dimensional imbalanced data , 2013, Neurocomputing.

[82]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[83]  Robert Hudson,et al.  Efficient or adaptive markets? Evidence from major stock markets using very long run historic data , 2013 .

[84]  Andrew Trotman,et al.  Sound and complete relevance assessment for XML retrieval , 2008, TOIS.

[85]  Isa Maks,et al.  A lexicon model for deep sentiment analysis and opinion mining applications , 2012, Decis. Support Syst..

[86]  Chi Xie,et al.  Multi-Scale Approximate Entropy Analysis of Foreign Exchange Markets Efficiency , 2012 .

[87]  Tunga Güngör,et al.  Comparison of text feature selection policies and using an adaptive framework , 2013, Expert Syst. Appl..

[88]  Lina Novickytė,et al.  Behavioural Finance: The Emergence and Development Trends , 2013 .

[89]  Mohamed M. Mostafa,et al.  More than words: Social networks' text mining for consumer brand sentiments , 2013, Expert Syst. Appl..

[90]  Feng Li The Information Content of Forward-Looking Statements in Corporate Filings—A Naïve Bayesian Machine Learning Approach , 2010 .

[91]  Björn W. Schuller,et al.  New Avenues in Opinion Mining and Sentiment Analysis , 2013, IEEE Intelligent Systems.

[92]  Hsinchun Chen,et al.  Textual analysis of stock market prediction using breaking financial news: The AZFin text system , 2009, TOIS.

[93]  Tarun Chordia,et al.  High-Frequency Trading , 2013 .

[94]  Nigel Collier,et al.  An Experiment in Integrating Sentiment Features for Tech Stock Prediction in Twitter , 2012 .

[95]  M. Griebel,et al.  Intraday Foreign Exchange Rate Forecasting Using Sparse Grids , 2012 .

[96]  Shengyi Jiang,et al.  An improved K-nearest-neighbor algorithm for text categorization , 2012, Expert Syst. Appl..

[97]  Raymond K. Wong,et al.  Currency Exchange Rate Forecasting From News Headlines , 2002, Australasian Database Conference.

[98]  Kansheng Shi,et al.  Efficient text classification method based on improved term reduction and term weighting , 2011 .

[99]  Yong Shi,et al.  The Role of Text Pre-processing in Sentiment Analysis , 2013, ITQM.

[100]  Bruno Pouliquen,et al.  Opinion Mining on Newspaper Quotations , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[101]  Chenchuramaiah T. Bathala Giving Content to Investor Sentiment: The Role of Media in the Stock Market , 2007 .

[102]  Majid Bahrepour,et al.  An adaptive ordered fuzzy time series with application to FOREX , 2011, Expert Syst. Appl..

[103]  Diego Garc,et al.  Noise and aggregation of information in large markets , 2010 .

[104]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[105]  Uzay Kaymak,et al.  Prediction of Stock Price Movements Based on Concept Map Information , 2007, 2007 IEEE Symposium on Computational Intelligence in Multi-Criteria Decision-Making.

[106]  Jianhua Guo,et al.  A Bayesian feature selection paradigm for text classification , 2012, Inf. Process. Manag..

[107]  Hong Miao,et al.  Currency jumps, cojumps and the role of macro news , 2014 .

[108]  L. Yao,et al.  Predictive ability and profitability of simple technical trading rules: Recent evidence from Southeast Asian stock markets , 2013 .

[109]  Fatos Xhafa,et al.  Utilizing artificial neural networks and genetic algorithms to build an algo-trading model for intra-day foreign exchange speculation , 2013, Math. Comput. Model..

[110]  William Eberle,et al.  Genetic algorithms in feature and instance selection , 2013, Knowl. Based Syst..

[111]  Richard Roll,et al.  Recent Trends in Trading Activity , 2009 .

[112]  Songbo Tan,et al.  Adapting centroid classifier for document categorization , 2011, Expert Syst. Appl..

[113]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[114]  Azadeh Nikfarjam,et al.  Text mining approaches for stock market prediction , 2010, 2010 The 2nd International Conference on Computer and Automation Engineering (ICCAE).

[115]  A. K. Nassirtoussi,et al.  A novel FOREX prediction methodology based on fundamental data , 2013 .

[116]  Ju Cheng Yang,et al.  Text categorization algorithms using semantic approaches, corpus-based thesaurus and WordNet , 2012, Expert Syst. Appl..

[117]  Sofus A. Macskassy,et al.  More than Words: Quantifying Language to Measure Firms' Fundamentals the Authors Are Grateful for Assiduous Research Assistance from Jie Cao and Shuming Liu. We Appreciate Helpful Comments From , 2007 .

[118]  J. Miranda,et al.  How fast do stock prices adjust to market efficiency? Evidence from a detrended fluctuation analysis , 2013 .

[119]  Marián Vajtersic,et al.  Parallel rare term vector replacement: Fast and effective dimensionality reduction for text , 2013, J. Parallel Distributed Comput..

[120]  Serkan Günal,et al.  A novel probabilistic feature selection method for text classification , 2012, Knowl. Based Syst..

[121]  João Francisco Valiati,et al.  Document-level sentiment classification: An empirical comparison between SVM and ANN , 2013, Expert Syst. Appl..