Exploring the Linear and Nonlinear Causality Between Internet Big Data and Stock Markets

In the era of big data, stock markets are closely connected with Internet big data from diverse sources. This paper makes the first attempt to compare the linkage between stock markets and various Internet big data collected from search engines, public media and social media. To achieve this purpose, a big data-based causality testing framework is proposed with three steps, i.e., data crawling, data mining and causality testing. Taking the Shanghai Stock Exchange and Shenzhen Stock Exchange as targets for stock markets, web search data, news, and microblogs as samples of Internet big data, some interesting findings can be obtained. 1) There is a strong bi-directional, linear and nonlinear Granger causality between stock markets and investors’ web search behaviors due to some similar trends and uncertain factors. 2) News sentiments from public media have Granger causality with stock markets in a bi-directional linear way, while microblog sentiments from social media have Granger causality with stock markets in a unidirectional linear way, running from stock markets to microblog sentiments. 3) News sentiments can explain the changes in stock markets better than microblog sentiments due to their authority. The results of this paper might provide some valuable information for both stock market investors and modelers.

[1]  Kuan-Ling Lai,et al.  How Web Search Activity exert Influence on Stock Trading across Market States? , 2014, PACIS.

[2]  J. Qiu,et al.  A nonlinear Granger causality test between stock returns and investor sentiment for Chinese stock market: a wavelet-based approach , 2016 .

[3]  Yonatan Belinkov,et al.  Challenging Language-Dependent Segmentation for Arabic: An Application to Machine Translation and Part-of-Speech Tagging , 2017, ACL.

[4]  Craig Hiemstra,et al.  Testing for Linear and Nonlinear Granger Causality in the Stock Price-Volume Relation , 1994 .

[5]  Hao Chen,et al.  Micro-blog social moods and Chinese stock market: the influence of emotional valence and arousal on Shanghai Composite Index volume , 2015, Int. J. Embed. Syst..

[6]  Chengqing Zong,et al.  Which is More Suitable for Chinese Word Segmentation, the Generative Model or the Discriminative One? , 2009, PACLIC.

[7]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[8]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[9]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[10]  Ying Liu,et al.  Composite leading search index: a preprocessing method of internet search data for stock trends prediction , 2015, Ann. Oper. Res..

[11]  Brad M. Barber,et al.  All that Glitters: The Effect of Attention and News on the Buying Behavior of Individual and Institutional Investors , 2006 .

[12]  Jin-Lung Lin,et al.  Can economic news predict Taiwan stock market returns? , 2018, Asia Pacific Management Review.

[13]  Ling Tang,et al.  Linear and nonlinear Granger causality investigation between carbon market and crude oil market: A multi-scale approach , 2015 .

[14]  Khurshid Ahmad,et al.  Estimating the impact of domain-specific news sentiment on financial assets , 2018, Knowl. Based Syst..

[15]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[16]  Michał Dzieliński,et al.  News Sensitivity and the Cross - Section of Stock Returns , 2011 .

[17]  Thomas Dimpfl,et al.  Can Internet Search Queries Help to Predict Stock Market Volatility? , 2012 .

[18]  Arjan Durresi,et al.  Using Twitter trust network for stock market analysis , 2018, Knowl. Based Syst..

[19]  Xiong Xiong,et al.  Open source information, investor attention, and asset pricing , 2013 .

[20]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[21]  Xiong Xiong,et al.  Internet information arrival and volatility of SME PRICE INDEX , 2014 .

[22]  Li Tang,et al.  Predicting the direction of stock markets using optimized neural networks with Google Trends , 2018, Neurocomputing.

[23]  Fredj Jawadi,et al.  An analysis of the effect of investor sentiment in a heterogeneous switching transition model for G7 stock markets , 2017, Journal of Economic Dynamics and Control.

[24]  Ling Li,et al.  Big data in tourism research: A literature review , 2018, Tourism Management.