A Study on Microblog and Search Engine User Behaviors: How Twitter Trending Topics Help Predict Google Hot Queries

Once every five minutes, Twitter publishes a list of trending topics by monitoring and analyzing tweets from its users. Similarly, Google makes available hourly a list of hot queries that have been issued to the search engine. We claim that social trends fired by Twitter may help explain and predict web trends derived from Google. Indeed, we argue that information flooding nearly real-time across the Twitter social network could anticipate the set of topics that users will later search on the Web. In this work, we analyze the time series derived from the daily volume index of each trend, either by Twitter or Google. Our study on a real-world dataset reveals that about 26% of the trending topics raising from Twitter “asis” are also found as hot queries issued to Google. Also, we find that about 72% of the similar trends appear first on Twitter. Thus, we assess the relation between comparable Twitter and Google trends by testing three classes of time series regression models. First, we find that Google by its own is not able to effectively predict the time behavior of its trends. Indeed, we show that autoregressive models, which try to fit time series of Google trends, perform poorly. On the other hand, we validate the forecasting power of Twitter by showing that models, which use Google as the dependent variable and Twitter as the explanatory variable, retain as significant the past values of Twitter 60% of times. Moreover, we discover that a Twitter trend causes as imilar Google trend to later occur about 43% of times. In the end, we show that the very best-performing models are those using past values of both Twitter and Google.

[1]  Maria de Fatima Oliveira,et al.  Affective News and Networked Publics: The Rhythms of News Storytelling on #Egypt , 2012 .

[2]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[3]  Salvatore Orlando,et al.  Twitter anticipates bursts of requests for Wikipedia articles , 2013, DUBMOD '13.

[4]  H. Varian,et al.  Predicting the Present with Google Trends , 2009 .

[5]  C. Granger,et al.  Co-integration and error correction: representation, estimation and testing , 1987 .

[6]  Sergei Vassilvitskii,et al.  Generalized distances between rankings , 2010, WWW '10.

[7]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[8]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[9]  Bernard J. Jansen,et al.  Twitter power: Tweets as electronic word of mouth , 2009, J. Assoc. Inf. Sci. Technol..

[10]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[11]  Fang Wu,et al.  Social Networks that Matter: Twitter Under the Microscope , 2008, First Monday.

[12]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[13]  Vasyl Pihur,et al.  RankAggreg, an R package for weighted rank aggregation , 2009, BMC Bioinformatics.

[14]  Mary Beth Rosson,et al.  How and why people Twitter: the role that micro-blogging plays in informal communication at work , 2009, GROUP.

[15]  David M. Pennock,et al.  Predicting consumer behavior with Web search , 2010, Proceedings of the National Academy of Sciences.

[16]  John H. Gerdes,et al.  Using web-based search data to predict macroeconomic statistics , 2005, CACM.

[17]  Guido Caldarelli,et al.  Web Search Queries Can Predict Stock Market Volumes , 2011, PloS one.

[18]  Bu-Sung Lee,et al.  TwiNER: named entity recognition in targeted twitter stream , 2012, SIGIR '12.

[19]  Balachander Krishnamurthy,et al.  A few chirps about twitter , 2008, WOSN '08.

[20]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[21]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[22]  D. Dey,et al.  A First Course in Linear Model Theory , 2001 .

[23]  R. Engle,et al.  COINTEGRATION AND ERROR CORRECTION: REPRESENTATION , 1987 .

[24]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[25]  G. Yule Why do we Sometimes get Nonsense-Correlations between Time-Series?--A Study in Sampling and the Nature of Time-Series , 1926 .

[26]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[27]  Diane J. Cook,et al.  Monitoring Influenza Trends through Mining Social Media , 2009, BIOCOMP.

[28]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[29]  Weiguo Fan,et al.  Web Query Prediction by Unifying Model , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[30]  C. Granger Some properties of time series data and their use in econometric model specification , 1981 .

[31]  J. Neter,et al.  Applied Linear Regression Models , 1983 .

[32]  Salvatore Orlando,et al.  Trending Topics on Twitter Improve the Prediction of Google Hot Queries , 2013, 2013 International Conference on Social Computing.

[33]  Aristides Gionis,et al.  Correlating financial time series with micro-blogging activity , 2012, WSDM '12.