Analysis of the prediction capability of web search data based on the HE-TDC method ‒ prediction of the volume of daily tourism visitors

Web search query data are obtained to reflect social spots and serve as novel economic indicators. When faced with high-dimensional query data, selecting keywords that have plausible predictive ability and can reduce dimensionality is critical. This paper presents a new integrative method that combines Hurst Exponent (HE) and Time Difference Correlation (TDC) analysis to select keywords with powerful predictive ability. The method is called the HE-TDC screening method and requires keywords with predictive ability to satisfy two characteristics, namely, high correlation and fluctuation memorability similar to the predicting target series. An empirical study is employed to predict the volume of tourism visitors in the Jiuzhai Valley scenic area. The study shows that keywords selected using HE-TDC method produce a model with better robustness and predictive ability.

[1]  H. Stanley,et al.  Quantifying Trading Behavior in Financial Markets Using Google Trends , 2013, Scientific Reports.

[2]  Xin Yang,et al.  Forecasting Chinese tourist volume with search engine data , 2015 .

[3]  E. Nsoesie,et al.  Monitoring Influenza Epidemics in China with Search Query from Baidu , 2013, PloS one.

[4]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[5]  Liwen Vaughan,et al.  Web search volume as a predictor of academic fame: An exploration of Google trends , 2014, J. Assoc. Inf. Sci. Technol..

[6]  Athanasios V. Vasilakos,et al.  Big data: From beginning to future , 2016, Int. J. Inf. Manag..

[7]  Declan Butler,et al.  When Google got flu wrong , 2013, Nature.

[8]  Geng Peng,et al.  Detecting Syphilis Amount in China Based on Baidu Query Data , 2014, SOCO 2014.

[9]  Bing Pan,et al.  Predicting Hotel Demand Using Destination Marketing Organization’s Web Traffic Data , 2014 .

[10]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[11]  Prosper F. Bangwayo-Skeete,et al.  Can Google data improve the forecasting performance of tourist arrivals? Mixed-data sampling approach , 2015 .

[12]  Ying Liu,et al.  Composite leading search index: a preprocessing method of internet search data for stock trends prediction , 2015, Ann. Oper. Res..

[13]  E. Brynjolfsson,et al.  The Future of Prediction: How Google Searches Foreshadow Housing Prices and Sales , 2013, ICIS 2013.

[14]  S. L. Scott,et al.  Bayesian Variable Selection for Nowcasting Economic Time Series , 2013 .

[15]  Erik Brynjolfsson,et al.  Crowd-squared: amplifying the predictive power of search trend data , 2016 .

[16]  Jingfei Du,et al.  Box office prediction based on microblog , 2014, Expert Syst. Appl..

[17]  Dai Wei,et al.  Prediction of online trade growth using search-ANFIS: Transactions on Taobao as examples , 2014, FUZZ-IEEE.