Forecasting Chinese Stock Market Prices using Baidu Search Index with a Learning-Based Data Collection Method

In this study, to address search index selection and volatility problems, we propose a learning-based search index collection method that collects the search data fraction for modeling by learning the best criteria from robust statistics. Based on the fraction of collected search index from internet search engine (Baidu.com) data sources, a novel model is formulated for Chinese stock market price forecasting. We empirically test our method on the two main Chinese stock market price indexes and discover that the prediction accuracy is equivalent or superior to the benchmarks from previous studies that used alternative search index collection methods or lagged data prediction models. All prediction results outstand the importance of an effective data collection method for the robustness of forecast models and demonstrate the utility of a learning-based collection method for addressing search index collection problem, leading to a significant improvement in Chinese stock market price prediction accuracy.

[1]  Kyoung-jae Kim,et al.  Financial time series forecasting using support vector machines , 2003, Neurocomputing.

[2]  Peter J. Huber,et al.  Robust Statistics , 2005, Wiley Series in Probability and Statistics.

[3]  Juri Marcucci,et al.  'Google It!' Forecasting the US Unemployment Rate with A Google Job Search Index , 2010 .

[4]  John H. Gerdes,et al.  Using web-based search data to predict macroeconomic statistics , 2005, CACM.

[5]  Lijuan Cao,et al.  Support vector machines experts for time series forecasting , 2003, Neurocomputing.

[6]  A. Dugas,et al.  Google Flu Trends: correlation with emergency department influenza rates and crowding metrics. , 2011, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[7]  Chih-Fong Tsai,et al.  Combining multiple feature selection methods for stock prediction: Union, intersection, and multi-intersection approaches , 2010, Decis. Support Syst..

[8]  Pengjian Shang,et al.  Multidimensional k-nearest neighbor model based on EEMD for financial time series forecasting , 2017 .

[9]  Ray Tsaih,et al.  Forecasting S&P 500 stock index futures with a hybrid AI system , 1998, Decis. Support Syst..

[10]  H. Varian,et al.  Predicting the Present with Google Trends , 2012 .

[11]  K. Lai,et al.  Forecasting crude oil price with an EMD-based neural network ensemble learning paradigm , 2008 .

[12]  Guido Caldarelli,et al.  Web Search Queries Can Predict Stock Market Volumes , 2011, PloS one.

[13]  P. K. Kannan,et al.  Using online search data to forecast new product sales , 2012, Decis. Support Syst..

[14]  Ling Tang,et al.  A novel decomposition ensemble model with extended extreme learning machine for crude oil price forecasting , 2016, Eng. Appl. Artif. Intell..

[15]  Yi-Ming Wei,et al.  Carbon price forecasting with a novel hybrid ARIMA and least squares support vector machines methodology , 2013 .

[16]  Ling Tang,et al.  A Novel CEEMD-Based EELM Ensemble Learning Paradigm for Crude Oil Price Forecasting , 2015, Int. J. Inf. Technol. Decis. Mak..

[17]  Michael Y. Hu,et al.  Forecasting with artificial neural networks: The state of the art , 1997 .

[18]  A. Hulth,et al.  Web Queries as a Source for Syndromic Surveillance , 2009, PloS one.

[19]  G. P. Smith,et al.  Google Internet Search Activity and Volatility Prediction in the Market for Foreign Currency , 2012 .

[20]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[21]  Pengjian Shang,et al.  APPLICATION OF EMPIRICAL MODE DECOMPOSITION COMBINED WITH k-NEAREST NEIGHBORS APPROACH IN FINANCIAL TIME SERIES FORECASTING , 2012 .

[22]  Michał Dzieliński,et al.  National Centre of Competence in Research Financial Valuation and Risk Management Working Paper No . 638 Measuring Economic Uncertainty and its Impact on the Stock Market Michal Dzielinski , 2010 .

[23]  Ying Liu,et al.  Composite leading search index: a preprocessing method of internet search data for stock trends prediction , 2015, Ann. Oper. Res..

[24]  Sarat Chandra Nayak,et al.  Estimating stock closing indices using a GA-weighted condensed polynomial neural network , 2018, Financial Innovation.

[25]  Yi-Fan Wang,et al.  Mining stock price using fuzzy rough set system , 2003, Expert Syst. Appl..

[26]  J. Brownstein,et al.  Digital disease detection--harnessing the Web for public health surveillance. , 2009, The New England journal of medicine.

[27]  Kimon P. Valavanis,et al.  Surveying stock market forecasting techniques - Part II: Soft computing methods , 2009, Expert Syst. Appl..

[28]  A. Valdivia,et al.  Diseases Tracked by Using Google Trends, Spain , 2010, Emerging infectious diseases.

[29]  Erik Brynjolfsson,et al.  Crowd-squared: amplifying the predictive power of search trend data , 2016 .

[30]  Peter Molnár,et al.  Google Searches and Stock Returns , 2016 .

[31]  David M. Pennock,et al.  Using internet searches for influenza surveillance. , 2008, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[32]  F. Tay,et al.  Application of support vector machines in financial time series forecasting , 2001 .

[33]  H Eugene Stanley,et al.  Complex dynamics of our economic life on different scales: insights from search engine query data , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[34]  Peter Molnár,et al.  Google searches and Gasoline prices , 2017, 2017 14th International Conference on the European Energy Market (EEM).

[35]  Torsten Schmidt,et al.  Forecasting private consumption: survey‐based indicators vs. Google trends , 2011 .

[36]  Chih-Chou Chiu,et al.  Financial time series forecasting using independent component analysis and support vector regression , 2009, Decis. Support Syst..

[37]  Y. Gel,et al.  Influenza Forecasting with Google Flu Trends , 2013, PloS one.

[38]  Ying Chen,et al.  Improving option price forecasts with neural networks and support vector regressions , 2009, Neurocomputing.

[39]  Francis Eng Hock Tay,et al.  Support vector machine with adaptive parameters in financial time series forecasting , 2003, IEEE Trans. Neural Networks.

[40]  Kin Keung Lai,et al.  Neural Networks in Finance and Economics Forecasting , 2007, Int. J. Inf. Technol. Decis. Mak..

[41]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[42]  Yi Peng,et al.  Nonlinear manifold learning for early warnings in financial markets , 2017, Eur. J. Oper. Res..

[43]  Paulo Melo,et al.  Nowcasting unemployment rate and new car sales in south-western Europe with Google Trends , 2013 .

[44]  Russell L. Purvis,et al.  An analysis of a hybrid neural network and pattern recognition technique for predicting short-term increases in the NYSE composite index , 2002 .

[45]  Ammar Belatreche,et al.  Forecasting movements of health-care stock prices based on different categories of news articles using multiple kernel learning , 2016, Decis. Support Syst..

[46]  Cheng Cheng,et al.  Data mining for unemployment rate prediction using search engine query data , 2012, Service Oriented Computing and Applications.

[47]  Chien Chin Chen,et al.  A novel trend surveillance system using the information from web search engines , 2016, Decis. Support Syst..

[48]  F. Diebold,et al.  Comparing Predictive Accuracy , 1994, Business Cycles.

[49]  Klaus F. Zimmermann,et al.  Google Econometrics and Unemployment Forecasting , 2009 .

[50]  Chih-Chou Chiu,et al.  Integration of nonlinear independent component analysis and support vector regression for stock price forecasting , 2013, Neurocomputing.

[51]  Xiong Xiong,et al.  Baidu index and predictability of Chinese stock returns , 2017, Financial Innovation.

[52]  Peter Molnár,et al.  Google searches and stock market activity: Evidence from Norway , 2019, Finance Research Letters.

[53]  Chih-Chou Chiu,et al.  A hybrid approach by integrating wavelet-based feature extraction with MARS and SVR for stock index forecasting , 2013, Decis. Support Syst..

[54]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[55]  Ping-Feng Pai,et al.  A hybrid ARIMA and support vector machines model in stock price forecasting , 2005 .

[56]  Kyong Joo Oh,et al.  Analyzing Stock Market Tick Data Using Piecewise Nonlinear Model , 2022 .

[57]  Francis Eng Hock Tay,et al.  Financial Forecasting Using Support Vector Machines , 2001, Neural Computing & Applications.

[58]  A. Flahault,et al.  More Diseases Tracked by Using Google Trends , 2009, Emerging infectious diseases.