Forecasting crude oil price with multilingual search engine data

Abstract In the big data era, search engine data (SED) have presented new opportunities for improving crude oil price prediction; however, the existing research were confined to single-language (mostly English) search keywords in SED collection. To address such a language bias and grasp worldwide investor attention, this study proposes a novel multilingual SED-driven forecasting methodology from a global perspective. The proposed methodology includes three main steps: (1) multilingual index construction, based on multilingual SED; (2) relationship investigation, between the multilingual index and crude oil price; and (3) oil price prediction, with the multilingual index as an informative predictor. With WTI spot price as studying samples, the empirical results indicate that SED have a powerful predictive power for crude oil price; nevertheless, multilingual SED statistically demonstrate better performance than single-language SED, in terms of enhancing prediction accuracy and model robustness.

[1]  Krzysztof Drachal,et al.  Forecasting spot oil price in a dynamic model averaging framework — Have the determinants changed over time? , 2016 .

[2]  Jianping Li,et al.  A deep learning ensemble approach for crude oil price forecasting , 2017 .

[3]  Rob Law,et al.  Forecasting tourism demand with composite search index , 2017 .

[4]  Dejan J. Sobajic,et al.  Learning and generalization characteristics of the random vector Functional-link net , 1994, Neurocomputing.

[5]  G. Cortazar,et al.  Modeling and predicting oil VIX: Internet search volume versus traditional mariables , 2017 .

[6]  Ling Li,et al.  Big data in tourism research: A literature review , 2018, Tourism Management.

[7]  Andrea Fronzetti Colladon,et al.  Using four different online media sources to forecast the crude oil price , 2018, J. Inf. Sci..

[8]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[9]  Yuanqing Xia,et al.  A new sampling method in particle filter based on Pearson correlation coefficient , 2016, Neurocomputing.

[10]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[11]  Tao Chen,et al.  Effective tourist volume forecasting supported by PCA and improved BPNN using Baidu index , 2018, Tourism Management.

[12]  Guang-Bin Huang,et al.  Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[13]  Dehua Shen,et al.  The cross-correlations between online sentiment proxies: Evidence from Google Trends and Twitter , 2018, Physica A: Statistical Mechanics and its Applications.

[14]  Ting Yao,et al.  How does investor attention affect international crude oil prices , 2017 .

[15]  Shian-Chang Huang,et al.  Online option price forecasting by using unscented Kalman filters and support vector machines , 2008, Expert Syst. Appl..

[16]  Dehua Shen,et al.  Quantifying the cross-correlations between online searches and Bitcoin market , 2018, Physica A: Statistical Mechanics and its Applications.

[17]  Tianyang Wang,et al.  Influential factors in crude oil price forecasting , 2017 .

[18]  Xin Li,et al.  How does Google search affect trader positions and crude oil prices , 2015 .

[19]  Bing Pan,et al.  Google Trends and tourists' arrivals: Emerging biases and proposed corrections , 2018, Tourism Management.

[20]  Lutz Kilian,et al.  Do High-Frequency Financial Data Help Forecast Oil Prices? The Midas Touch at Work , 2013 .

[21]  Ling Tang,et al.  Oil-importing optimal decision considering country risk with extreme events: A multi-objective programming approach , 2014, Comput. Oper. Res..

[22]  Mahmoud Qadan,et al.  Investor sentiment and the price of oil , 2018 .

[23]  F. Diebold,et al.  Comparing Predictive Accuracy , 1994, Business Cycles.

[24]  Yudong Wang,et al.  Forecasting the real prices of crude oil under economic and statistical constraints , 2015 .

[25]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[26]  Michael Ye,et al.  Forecasting short-run crude oil price using high- and low-inventory variables , 2006 .

[27]  Rob J. Hyndman,et al.  Crude oil price forecasting based on internet concern using an extreme learning machine , 2018, International Journal of Forecasting.

[28]  Zebin Yang,et al.  Online big data-driven oil consumption forecasting with Google trends , 2019, International Journal of Forecasting.

[29]  Hamed Ghoddusi,et al.  Google search keywords that best predict energy price volatility , 2017 .

[30]  Jian Chai,et al.  Forecasting the WTI crude oil price by a hybrid-refined method , 2018 .

[31]  Li Yang,et al.  Forecasting crude oil market volatility: A Markov switching multifractal volatility approach , 2016 .

[32]  Xizhao Wang,et al.  A review on neural networks with random weights , 2018, Neurocomputing.

[33]  Hanan Naser,et al.  Estimating and forecasting the real prices of crude oil: A data rich model using a dynamic model averaging (DMA) approach , 2016 .

[34]  Kai Xu,et al.  Geo-environmental suitability assessment for agricultural land in the rural–urban fringe using BPNN and GIS: a case study of Hangzhou , 2016, Environmental Earth Sciences.

[35]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[36]  Lean Yu,et al.  A randomized-algorithm-based decomposition-ensemble learning methodology for energy price forecasting , 2018, Energy.