Predicting a Stock Portfolio with the Multivariate Bayesian Structural Time Series Model: Do News or Emotions Matter?

In this paper, we provide methods for creatively incorporating information from financial news and Twitter feeds into predicting the prices of a portfolio of stocks, using the framework of the Multivariate Bayesian Structural Time Series (MBSTS) model. MBSTS is a Bayesian machine learning model designed to capture correlations among multiple target time series, while using a number of contemporaneous predictors. As an illustration of the current model, we use data on two leading online commerce companies, namely Amazon and eBay, and run extensive empirical experiments to examine which if any, text mining predictors would add to the predictability of a stock price. Evaluation of competing models such as the autoregressive integrated moving average (ARIMA) model, and the recurrent neural network (RNN) model with long short term memory (LSTM), in terms of their performances with respect to cumulative one-step-ahead forecast errors with and without sentimental predictors, were carried out. Our contributions are threefold: Firstly, our model is the first one that successfully incorporated the online text mining to an advanced multivariate Bayesian machine learning time series model, which opens the door of applying both text mining and machine learning simultaneously in modern quantitative finance research; Secondly, under the presence of both modern and classical predictors in both fundamental and technical sense, the polarity of news still adds on a complementary effect; Thirdly, we discover that all models under investigation with sentimental predictors do outperform models without these sentimental predictors, and the MBSTS model with sentimental predictors outperforms all the other models.

[1]  Grigorios Tsoumakas,et al.  PersoNews: A Personalized News Reader Enhanced by Machine Learning and Semantic Filtering , 2006, OTM Conferences.

[2]  Shane Greenstein,et al.  Economic Analysis of the Digital Economy , 2015 .

[3]  Giovanni Semeraro,et al.  A Comparison of Lexicon-based Approaches for Sentiment Analysis of Microblog Posts , 2014, DART@AI*IA.

[4]  Siem Jan Koopman,et al.  A simple and efficient simulation smoother for state space time series analysis , 2002 .

[5]  Alejandro Murua,et al.  Hierarchical model-based clustering of large datasets through fractionation and refractionation , 2002, Inf. Syst..

[6]  Wei-Chiang Hong The Application of Support Vector Machines to Forecast Tourist Arrivals in Barbados: An Empirical Study , 2006 .

[7]  E. George,et al.  APPROACHES FOR BAYESIAN VARIABLE SELECTION , 1997 .

[8]  Andrew Harvey,et al.  Forecasting, Structural Time Series Models and the Kalman Filter , 1990 .

[9]  Danielle Azar,et al.  A Comparative Study of Nine Machine Learning Techniques Used for the Prediction of Diseases , 2018 .

[10]  William E. Griffiths Bayesian Inference in the Seemingly Unrelated Regressions Model , 2003 .

[11]  Nick Bassiliades,et al.  Ontology-based sentiment analysis of twitter posts , 2013, Expert Syst. Appl..

[12]  H. Stanley,et al.  Quantifying Trading Behavior in Financial Markets Using Google Trends , 2013, Scientific Reports.

[13]  Pericles A. Mitkas,et al.  Classification techniques for air quality forecasting , 2006 .

[14]  Ping-Feng Pai,et al.  Revenue forecasting using a least-squares support vector regression model in a fuzzy environment , 2013, Inf. Sci..

[15]  Weizhong Yan,et al.  Gaussian process for long-term time-series forecasting , 2009, 2009 International Joint Conference on Neural Networks.

[16]  D. Madigan,et al.  Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window , 1994 .

[17]  Nikola K. Kasabov,et al.  Evolving fuzzy neural networks for supervised/unsupervised online knowledge-based learning , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[18]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[19]  Lazaros S. Iliadis,et al.  Feature extraction for time-series data: An artificial neural network evolutionary training model for the management of mountainous watersheds , 2009, Neurocomputing.

[20]  Mário J. Silva,et al.  Clues for detecting irony in user-generated contents: oh...!! it's "so easy" ;-) , 2009, TSA@CIKM.

[21]  T. Slini REGRESSION ANALYSIS AND URBAN AIR QUALITY FORECASTING : AN APPLICATION FOR THE CITY OF ATHENS , 2004 .

[22]  Weizhong Yan,et al.  Toward Automatic Time-Series Forecasting Using Neural Networks , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Eugénio C. Oliveira,et al.  Tokenizing micro-blogging messages using a text classification approach , 2010, AND '10.

[24]  S. L. Scott,et al.  Bayesian Variable Selection for Nowcasting Economic Time Series , 2013 .

[25]  Enrico Motta,et al.  IRS-II: A Framework and Infrastructure for Semantic Web Services , 2003, SEMWEB.

[26]  Daladier Jabba Molinares,et al.  ERNEAD: Training of Artificial Neural Networks Based on a Genetic Algorithm and Finite Automata Theory , 2018 .

[27]  Ping-Feng Pai,et al.  A hybrid ARIMA and support vector machines model in stock price forecasting , 2005 .

[28]  Igor Skrjanc,et al.  New results in modelling derived from Bayesian filtering , 2010, Knowl. Based Syst..

[29]  Tim Oates,et al.  Time series classification from scratch with deep neural networks: A strong baseline , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).

[30]  Nicola Dragoni,et al.  An infrastructure to support cooperation of knowledge-level agents on the semantic Grid , 2006, Applied Intelligence.

[31]  Ping-Feng Pai,et al.  Time series forecasting by a seasonal support vector regression model , 2010, Expert Syst. Appl..

[32]  Mário J. Silva,et al.  Automatic creation of a reference corpus for political opinion mining in user-generated content , 2009, TSA@CIKM.

[33]  Georgios Meditskos,et al.  On the Combination of Textual and Semantic Descriptions for Automated Semantic Web Service Classification , 2009, AIAI.

[34]  Adrian E. Raftery,et al.  Bayesian Model Averaging: A Tutorial , 2016 .

[35]  D. Dickey,et al.  Testing for unit roots in autoregressive-moving average models of unknown order , 1984 .

[36]  Lazaros S. Iliadis,et al.  Time-series modeling of fishery landings using ARIMA models and Fuzzy Expected Intervals software , 2006, Environ. Model. Softw..

[37]  José Martins,et al.  TwitterEcho: a distributed focused crawler to support open research with twitter data , 2012, WWW.

[38]  Alejandro Murua,et al.  Assessment and pruning of hierarchical model based clustering , 2003, KDD '03.

[39]  Enrico Motta,et al.  Specifications of Knowledge Components for Reuse , 1999 .

[40]  Nikola K. Kasabov,et al.  DENFIS: dynamic evolving neural-fuzzy inference system and its application for time-series prediction , 2002, IEEE Trans. Fuzzy Syst..

[41]  N Moussiopoulos,et al.  Statistical analysis of environmental data as the basis of forecasting: an air quality application. , 2002, The Science of the total environment.

[42]  J. Kukkonen,et al.  Intercomparison of air quality data using principal component analysis, and forecasting of PM₁₀ and PM₂.₅ concentrations using artificial neural networks, in Thessaloniki and Helsinki. , 2011, The Science of the total environment.

[43]  S. Rao Jammalamadaka,et al.  Multivariate Bayesian Structural Time Series Model , 2018, J. Mach. Learn. Res..

[44]  Jesús Medina,et al.  Multi-adjoint t-concept lattices , 2010, Inf. Sci..

[45]  Steven L. Scott,et al.  Predicting the Present with Bayesian Structural Time Series , 2013, Int. J. Math. Model. Numer. Optimisation.

[46]  Rob J Hyndman,et al.  Automatic Time Series Forecasting: The forecast Package for R , 2008 .