Using Internet Search Trends and Historical Trading Data for Predicting Stock Markets by the Least Squares Support Vector Regression Model

Historical trading data, which are inevitably associated with the framework of causality both financially and theoretically, were widely used to predict stock market values. With the popularity of social networking and Internet search tools, information collection ways have been diversified. Instead of only theoretical causality in forecasting, the importance of data relations has raised. Thus, the aim of this study was to investigate performances of forecasting stock markets by data from Google Trends, historical trading data (HTD), and hybrid data. The keywords employed for Google Trends are collected from three different ways including users' definitions (GTU), trending searches of Google Trends (GTTS), and tweets (GTT) correspondingly. The hybrid data include Internet search trends from Google Trends and historical trading data. In addition, the correlation-based feature selection (CFS) technique is used to select independent variables, and one-step ahead policy is adopted by the least squares support vector regression (LSSVR) for predicting stock markets. Numerical experiments indicate that using hybrid data can provide more accurate forecasting results than using single historical trading data or data from Google Trends. Thus, using hybrid data of Internet search trends and historical trading data by LSSVR models is a promising alternative for forecasting stock markets.

[1]  Xiao Wang,et al.  World Cup 2014 in the Twitter World: A big data analysis of sentiments in U.S. sports fans' tweets , 2015, Comput. Hum. Behav..

[2]  D. Lester,et al.  Using google searches on the internet to monitor suicidal behavior. , 2013, Journal of affective disorders.

[3]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[4]  Dean Fantazzini,et al.  Forecasting German Car Sales Using Google Data and Multivariate Models , 2015 .

[5]  L. Bielory,et al.  Internet searches and allergy: temporal variation in regional pollen counts correlates with Google searches for pollen allergy related terms. , 2014, Annals of Allergy, Asthma & Immunology.

[6]  R. Fletcher Practical Methods of Optimization , 1988 .

[7]  Ś. Sen,et al.  Use of Google Insights for Search to track seasonal and geographic kidney stone incidence in the United States. , 2011, Urology.

[8]  Ozgur M. Araz,et al.  Using Google Flu Trends data in forecasting influenza-like-illness related ED visits in Omaha, Nebraska. , 2014, The American journal of emergency medicine.

[9]  W. Karush Minima of Functions of Several Variables with Inequalities as Side Conditions , 2014 .

[10]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[11]  Arash Ghanbari,et al.  Integration of genetic fuzzy systems and artificial neural networks for stock price forecasting , 2010, Knowl. Based Syst..

[12]  Fumiko Takeda,et al.  Google Search Intensity and Its Relationship with Returns and Trading Volume of Japanese Stocks , 2013 .

[13]  Luis E. Zárate,et al.  Applying Artificial Neural Networks to prediction of stock price and improvement of the directional prediction index - Case study of PETR4, Petrobras, Brazil , 2013, Expert Syst. Appl..

[14]  B. Chae,et al.  Insights from hashtag #supplychain and Twitter Analytics: Considering Twitter and Twitter data for supply chain practice and research , 2015 .

[15]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[16]  Grace Lordan,et al.  Exploring the relationship between macroeconomic conditions and problem drinking as captured by Google searches in the US , 2013 .

[17]  Mehmet Özçalici,et al.  Integrating metaheuristics and Artificial Neural Networks for improved stock price prediction , 2016, Expert Syst. Appl..

[18]  S. Willard,et al.  Internet search trends analysis tools can provide real-time data on kidney stone disease in the United States. , 2013, Urology.

[19]  Paulo Cortez,et al.  The impact of microblogging data for stock market prediction: Using Twitter to predict returns, volatility, trading volume and survey sentiment indices , 2017 .

[20]  Jonathan L. Ticknor A Bayesian regularized artificial neural network for stock market forecasting , 2013, Expert Syst. Appl..

[21]  G. RubellMarionLincy,et al.  A multiple fuzzy inference systems framework for daily stock trading with application to NASDAQ stock exchange , 2016, Expert Syst. Appl..

[22]  Teruo Higashino,et al.  Twitter user profiling based on text and community mining for market analysis , 2013, Knowl. Based Syst..

[23]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[24]  Wu He,et al.  International Journal of Information Management Social Media Competitive Analysis and Text Mining: a Case Study in the Pizza Industry , 2022 .

[25]  Frédéric Karamé,et al.  Can Google Data Help Predict French Youth Unemployment , 2012 .

[26]  Willem M. Otte,et al.  Does Saint Nicholas provoke seizures? Hints from Google Trends , 2014, Epilepsy & Behavior.

[27]  S. Stephens-Davidowitz The cost of racial animus on a black candidate: Evidence using Google search data☆ , 2014 .

[28]  F. Girosi,et al.  Nonlinear prediction of chaotic time series using support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[29]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[30]  Judea Pearl,et al.  Heuristics : intelligent search strategies for computer problem solving , 1984 .

[31]  Pritpal Singh,et al.  Forecasting stock index price based on M-factors fuzzy time series and particle swarm optimization , 2014, Int. J. Approx. Reason..

[32]  Stefano Falorsi,et al.  Combining official and Google Trends data to forecast the Italian youth unemployment rate , 2017 .

[33]  Hyunjin Kim,et al.  LGscore: A method to identify disease-related genes using biological literature and Google data , 2015, J. Biomed. Informatics.

[34]  Ping-Feng Pai,et al.  Tourism demand forecasting using novel hybrid system , 2014, Expert Syst. Appl..

[35]  Gunnar Rätsch,et al.  Predicting Time Series with Support Vector Machines , 1997, ICANN.

[36]  G. P. Smith,et al.  Google Internet Search Activity and Volatility Prediction in the Market for Foreign Currency , 2012 .

[37]  Mohamed M. Mostafa,et al.  More than words: Social networks' text mining for consumer brand sentiments , 2013, Expert Syst. Appl..

[38]  D. Ingram,et al.  Seasonal trends in restless legs symptomatology: evidence from Internet search query data. , 2013, Sleep medicine.

[39]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[40]  Jian Ma,et al.  A MIDAS modelling framework for Chinese inflation index forecast incorporating Google search data , 2015, Electron. Commer. Res. Appl..

[41]  C. Peng,et al.  Association of Internet search trends with suicide death in Taipei City, Taiwan, 2004-2009. , 2011, Journal of affective disorders.

[42]  Eugen Trinka,et al.  Why do people Google epilepsy? An infodemiological study of online behavior for epilepsy-related search terms , 2014, Epilepsy & Behavior.

[43]  Ricardo A. S. Fernandes,et al.  Maximum and minimum stock price forecasting of Brazilian power distribution companies based on artificial neural networks , 2015, Appl. Soft Comput..

[44]  Md. Rafiul Hassan A combination of hidden Markov model and fuzzy model for stock market forecasting , 2009, Neurocomputing.

[45]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .