The Detection of Emerging Trends Using Wikipedia Traffic Data and Context Networks

Can online media predict new and emerging trends, since there is a relationship between trends in society and their representation in online systems? While several recent studies have used Google Trends as the leading online information source to answer corresponding research questions, we focus on the online encyclopedia Wikipedia often used for deeper topical reading. Wikipedia grants open access to all traffic data and provides lots of additional (semantic) information in a context network besides single keywords. Specifically, we suggest and study context-normalized and time-dependent measures for a topic’s importance based on page-view time series of Wikipedia articles in different languages and articles related to them by internal links. As an example, we present a study of the recently emerging Big Data market with a focus on the Hadoop ecosystem, and compare the capabilities of Wikipedia versus Google in predicting its popularity and life cycles. To support further applications, we have developed an open web platform to share results of Wikipedia analytics, providing context-rich and language-independent relevance measures for emerging trends.

[1]  H. Varian,et al.  Predicting the Present with Google Trends , 2009 .

[2]  Harry Eugene Stanley,et al.  Catastrophic cascade of failures in interdependent networks , 2009, Nature.

[3]  Johan Bollen,et al.  Computational Models of Consumer Confidence from Large-Scale Online Attention Data: Crowd-Sourcing Econometrics , 2014, PloS one.

[4]  Ralph Schroeder,et al.  Big data and Wikipedia research: social science knowledge across disciplinary divides , 2015 .

[5]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[6]  Zhu Wang Learning, diffusion and the industry life cycle , 2006 .

[7]  Ladislav Kristoufek,et al.  Nowcasting Unemployment Rates with Google Searches: Evidence from the Visegrad Group Countries , 2014, PloS one.

[8]  Jan W. Kantelhardt,et al.  Comparing the usage of global and local Wikipedias with focus on Swedish Wikipedia , 2013, ArXiv.

[9]  A. Pentland,et al.  Computational Social Science , 2009, Science.

[10]  H. Stanley,et al.  Quantifying Trading Behavior in Financial Markets Using Google Trends , 2013, Scientific Reports.

[11]  Wei Luo,et al.  Web search activity data accurately predict population chronic disease risk in the USA , 2015, Journal of Epidemiology & Community Health.

[12]  Oliver Speck,et al.  A high-resolution 7-Tesla fMRI dataset from complex natural stimulation with an audio movie , 2014, Scientific Data.

[13]  James Woodburn,et al.  Let me Google that for you: a time series analysis of seasonality in internet search trends for terms related to foot and ankle pain , 2015, Journal of Foot and Ankle Research.

[14]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[15]  Birger Hjørland,et al.  Work tasks and socio-cognitive relevance: A specific example , 2002, J. Assoc. Inf. Sci. Technol..

[16]  Taha Yasseri,et al.  Early Prediction of Movie Box Office Success Based on Wikipedia Activity Big Data , 2012, PloS one.

[17]  András Kornai,et al.  A Practical Approach to Language Complexity: A Wikipedia Case Study , 2012, PloS one.

[18]  Matthew Mohebbi,et al.  Assessing Google Flu Trends Performance in the United States during the 2009 Influenza Virus A (H1N1) Pandemic , 2011, PloS one.

[19]  Lev Muchnik,et al.  Fluctuations in Wikipedia access-rate and edit-event data , 2012 .

[20]  Tobias Preis,et al.  Quantifying the Relationship Between Financial News and the Stock Market , 2013, Scientific Reports.

[21]  Alexander Domnich,et al.  Age-Related Differences in the Accuracy of Web Query-Based Predictions of Influenza-Like Illness , 2015, PloS one.

[22]  Craig A Stow,et al.  Mining web-based data to assess public response to environmental events. , 2015, Environmental pollution.

[23]  J. Nadal,et al.  Manifesto of computational social science , 2012 .

[24]  A. Baram‐Tsabari,et al.  The half-life of a “teachable moment”: The case of Nobel laureates , 2015, Public Understanding of Science.

[25]  Daniele Barchiesi,et al.  Quantifying International Travel Flows Using Flickr , 2015, PloS one.

[26]  Ladislav Kristoufek,et al.  Power-law correlations in finance-related Google searches, and their cross-correlations with volatility and traded volume: Evidence from the Dow Jones Industrial components , 2015, 1502.00225.

[27]  Raphael H. Heiberger,et al.  Collective Attention and Stock Prices: Evidence from Google Trends Data on Standard and Poor's 100 , 2015, PloS one.

[28]  Taha Yasseri,et al.  Circadian Patterns of Wikipedia Editorial Activity: A Demographic Analysis , 2011, PloS one.

[29]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[30]  James M. Hyman,et al.  Forecasting the 2013–2014 Influenza Season Using Wikipedia , 2014, PLoS Comput. Biol..

[31]  A. Pentland,et al.  Life in the network: The coming age of computational social science: Science , 2009 .

[32]  H Eugene Stanley,et al.  Quantifying the semantics of search behavior before stock market moves , 2014, Proceedings of the National Academy of Sciences.

[33]  E. Ben-Jacob,et al.  Challenges in network science: Applications to infrastructures, climate, social systems and economics , 2012 .

[34]  H. Varian,et al.  Predicting the Present with Google Trends , 2012 .

[35]  Jan W. Kantelhardt,et al.  Hadoop. TS: Large-Scale Time-Series Processing , 2013 .

[36]  Eugen Trinka,et al.  Google search behavior for status epilepticus , 2015, Epilepsy & Behavior.

[37]  H. Eugene Stanley,et al.  Quantifying Wikipedia Usage Patterns Before Stock Market Moves , 2013, Scientific Reports.