Text-based crude oil price forecasting: A deep learning approach

Abstract This study proposes a new, novel crude oil price forecasting method based on online media text mining, with the aim of capturing the more immediate market antecedents of price fluctuations. Specifically, this is an early attempt to apply deep learning techniques to crude oil forecasting, and to extract hidden patterns within online news media using a convolutional neural network (CNN). While the news-text sentiment features and the features extracted by the CNN model reveal significant relationships with the price change, they need to be grouped according to their topics in the price forecasting in order to obtain a greater forecasting accuracy. This study further proposes a feature grouping method based on the Latent Dirichlet Allocation (LDA) topic model for distinguishing effects from various online news topics. Optimized input variable combination is constructed using lag order selection and feature selection methods. Our empirical results suggest that the proposed topic-sentiment synthesis forecasting models perform better than the older benchmark models. In addition, text features and financial features are shown to be complementary in producing more accurate crude oil price forecasts.

[1]  Berlin Chen,et al.  Leveraging Kullback–Leibler Divergence Measures and Information-Rich Cues for Speech Summarization , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Julien Velcin,et al.  Sentiment analysis on social media for stock movement prediction , 2015, Expert Syst. Appl..

[3]  Dirk Neumann,et al.  Early Warning of Impending Oil Crises Using the Predictive Power of Online News Stories , 2013, 2013 46th Hawaii International Conference on System Sciences.

[4]  G. Cifarelli,et al.  Oil Price Dynamics and Speculation: A Multivariate Financial Approach , 2008 .

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  K. Lai,et al.  Forecasting crude oil price with an EMD-based neural network ensemble learning paradigm , 2008 .

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  Jean-Michel Poggi,et al.  Variable selection using random forests , 2010, Pattern Recognit. Lett..

[9]  W. Fuller,et al.  Distribution of the Estimators for Autoregressive Time Series with a Unit Root , 1979 .

[10]  Michael Ye,et al.  A monthly crude oil spot price forecasting model using relative inventories , 2005 .

[11]  L. Kilian Not All Oil Price Shocks are Alike: Disentangling Demand and Supply Shocks in the Crude Oil Market , 2006 .

[12]  Mike Y. Chen,et al.  Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web , 2001 .

[13]  Lutz Kilian,et al.  A Practitioner's Guide to Lag Order Selection For VAR Impulse Response Analysis , 2005 .

[14]  Robert K. Kaufmann,et al.  Oil prices, speculation, and fundamentals: Interpreting causal relations among spot and futures prices , 2009 .

[15]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[16]  E. Prescott,et al.  Postwar U.S. Business Cycles: An Empirical Investigation , 1997 .

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  Alessandro Moschitti,et al.  UNITN: Training Deep Convolutional Neural Network for Twitter Sentiment Classification , 2015, *SEMEVAL.

[19]  R. Smyth,et al.  Cointegration between oil spot and future prices of the same and different grades in the presence of structural change , 2008, Energy Policy.

[20]  Jian Ma,et al.  A MIDAS modelling framework for Chinese inflation index forecast incorporating Google search data , 2015, Electron. Commer. Res. Appl..

[21]  Axel Pierru,et al.  Does Disagreement Among Oil Price Forecasters Reflect Volatility? Evidence from the ECB Surveys , 2014 .

[22]  Kin Keung Lai,et al.  Estimating the impact of extreme events on crude oil price: An EMD-based event analysis method , 2009 .

[23]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[24]  Bruce Abramson,et al.  Probabilistic forecasts from probabilistic models: A case study in the oil market , 1995 .

[25]  C. Granger Investigating Causal Relations by Econometric Models and Cross-Spectral Methods , 1969 .

[26]  Perry Sadorsky Oil price shocks and stock market activity , 1999 .

[27]  A. Mollick,et al.  Oil price fluctuations and U.S. dollar exchange rates , 2010 .

[28]  Hsinchun Chen,et al.  Textual analysis of stock market prediction using breaking financial news: The AZFin text system , 2009, TOIS.

[29]  Shiu‐Sheng Chen,et al.  Oil prices and real exchange rates , 2007 .

[30]  Cees Diks,et al.  The relationship between crude oil spot and futures prices: cointegration, linear and nonlinear causality , 2008 .

[31]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[32]  Robert K. Kaufmann,et al.  Modelling the world oil market: Assessment of a quarterly econometric model , 2007 .

[33]  Lutz Kilian,et al.  Do High-Frequency Financial Data Help Forecast Oil Prices? The Midas Touch at Work , 2013 .

[34]  Paul C. Tetlock Giving Content to Investor Sentiment: The Role of Media in the Stock Market , 2005, The Journal of Finance.

[35]  Ying Wah Teh,et al.  Text mining of news-headlines for FOREX market prediction: A Multi-layer Dimension Reduction Algorithm with semantics and sentiment , 2015, Expert Syst. Appl..

[36]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.