Measuring Global Disease with Wikipedia: Success, Failure, and a Research Agenda

Effective disease monitoring provides a foundation for effective public health systems. This has historically been accomplished with patient contact and bureaucratic aggregation, which tends to be slow and expensive. Recent internet-based approaches promise to be real-time and cheap, with few parameters. However, the question of when and how these approaches work remains open. We addressed this question using Wikipedia access logs and category links. Our experiments, replicable and extensible using our open source code and data, test the effect of semantic article filtering, amount of training data, forecast horizon, and model staleness by comparing across 6 diseases and 4 countries using thousands of individual models. We found that our minimal-configuration, language-agnostic article selection process based on semantic relatedness is effective for improving predictions, and that our approach is relatively insensitive to the amount and age of training data. We also found, in contrast to prior work, very little forecasting value, and we argue that this is consistent with theoretical considerations about the nature of forecasting. These mixed results lead us to propose that the currently observational field of internet-based disease surveillance must pivot to include theoretical models of information flow as well as controlled experiments based on simulations of disease.

[1]  Martin Rudi Holaker,et al.  Event Detection using Wikipedia , 2013 .

[2]  Roberto Erro,et al.  Why do people google movement disorders? An infodemiological study of information seeking behaviors , 2016, Neurological Sciences.

[3]  Gregory J. Park,et al.  Psychological Language on Twitter Predicts County-Level Heart Disease Mortality , 2015, Psychological science.

[4]  Richard Pebody,et al.  Assessing the impact of a health intervention via user-generated Internet content , 2015, Data Mining and Knowledge Discovery.

[5]  Taha Yasseri,et al.  Can electoral popularity be predicted using socially generated big data? , 2013, it Inf. Technol..

[6]  Mark Dredze,et al.  Separating Fact from Fear: Tracking Flu Infections on Twitter , 2013, NAACL.

[7]  E. Nsoesie,et al.  Using Clinicians’ Search Query Data to Monitor Influenza Epidemics , 2014, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[8]  Ozgur M. Araz,et al.  Using Google Flu Trends data in forecasting influenza-like-illness related ED visits in Omaha, Nebraska. , 2014, The American journal of emergency medicine.

[9]  Jae Ho Lee,et al.  Correlation between National Influenza Surveillance Data and Google Trends in South Korea , 2013, PloS one.

[10]  Li Na,et al.  Gonorrhea incidence forecasting research based on Baidu search data , 2013, 2013 International Conference on Management Science and Engineering 20th Annual Conference Proceedings.

[11]  Andrew C. Miller,et al.  Advances in nowcasting influenza-like illness rates using search query logs , 2015, Scientific Reports.

[12]  Mark Dredze,et al.  Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance , 2015, PLoS Comput. Biol..

[13]  Geng Peng,et al.  Detecting Syphilis Amount in China Based on Baidu Query Data , 2014, SOCO 2014.

[14]  Han Zhao,et al.  Assessing Google Correlate Queries for Influenza H1N1 Surveillance in Asian Developing Countries , 2015, ArXiv.

[15]  John S. Brownstein,et al.  Wikipedia Usage Estimates Prevalence of Influenza-Like Illness in the United States in Near Real-Time , 2014, PLoS Comput. Biol..

[16]  Nello Cristianini,et al.  Nowcasting Events from the Social Web with Statistical Learning , 2012, TIST.

[17]  Dotan A. Haim,et al.  Using Networks to Combine “Big Data” and Traditional Surveillance to Improve Influenza Predictions , 2015, Scientific Reports.

[18]  Ronald Rosenfeld,et al.  Flexible Modeling of Epidemics with an Empirical Bayes Framework , 2014, PLoS Comput. Biol..

[19]  David M. Pennock,et al.  Predicting consumer behavior with Web search , 2010, Proceedings of the National Academy of Sciences.

[20]  Michael J. Paul,et al.  Twitter Improves Influenza Forecasting , 2014, PLoS currents.

[21]  S. Mehta,et al.  A comparison of Internet search trends and sexually transmitted infection rates using Google trends. , 2014, Sexually transmitted diseases.

[22]  Ś. Sen,et al.  Use of Google Insights for Search to track seasonal and geographic kidney stone incidence in the United States. , 2011, Urology.

[23]  K. Jethwani,et al.  “Friending” Teens: Systematic Review of Social Media in Adolescent and Young Adult Health Care , 2015, Journal of medical Internet research.

[24]  Jan W Kantelhardt,et al.  The Detection of Emerging Trends Using Wikipedia Traffic Data and Context Networks , 2015, PloS one.

[25]  B. Nahed,et al.  Determination of geographic variance in stroke prevalence using Internet search engine analytics. , 2011, Neurosurgical focus.

[26]  Wei-keng Liao,et al.  Enhancing Financial Decision-Making Using Social Behavior Modeling , 2014, SNAKDD'14.

[27]  Alberto Maria Segre,et al.  Eliciting Disease Data from Wikipedia Articles , 2015, Proceedings of the ... International AAAI Conference on Weblogs and Social Media. International AAAI Conference on Weblogs and Social Media.

[28]  Peter A. Bonis,et al.  Correlation Between UpToDate Searches and Reported Cases of Middle East Respiratory Syndrome During Outbreaks in Saudi Arabia , 2016, Open forum infectious diseases.

[29]  C. Peng,et al.  Association of Internet search trends with suicide death in Taipei City, Taiwan, 2004-2009. , 2011, Journal of affective disorders.

[30]  James M. Hyman,et al.  Forecasting the 2013–2014 Influenza Season Using Wikipedia , 2014, PLoS Comput. Biol..

[31]  Jan V Hirschmann,et al.  Practice guidelines for the diagnosis and management of skin and soft tissue infections: 2014 update by the Infectious Diseases Society of America. , 2014, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[32]  Kjetil Nørvåg,et al.  WikiPop: personalized event detection system based on Wikipedia page view statistics , 2010, CIKM '10.

[33]  N. Nagelkerke,et al.  A note on a general definition of the coefficient of determination , 1991 .

[34]  Carol A Gotway Crawford,et al.  A New Source of Data for Public Health Surveillance: Facebook Likes , 2015, Journal of medical Internet research.

[35]  Anette Hulth,et al.  Eye-Opening Approach to Norovirus Surveillance , 2010, Emerging infectious diseases.

[36]  E. Weitzel,et al.  Correlating Regional Aeroallergen Effects on Internet Search Activity , 2015, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[37]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[38]  Y. Yasui,et al.  Google Flu Trends in Canada: a comparison of digital disease surveillance data with physician consultations and respiratory virus surveillance data, 2010–2014 , 2015, Epidemiology and Infection.

[39]  Myoung Su Park,et al.  Use of Internet Search Queries to Enhance Surveillance of Foodborne Illness , 2015, Emerging infectious diseases.

[40]  Mizuki Morita,et al.  Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter , 2011, EMNLP.

[41]  Paul R. Bergstresser,et al.  Google Technology in the Surveillance of Hand Foot Mouth Disease in Asia , 2014 .

[42]  J. Brownstein,et al.  Using search queries for malaria surveillance, Thailand , 2013, Malaria Journal.

[43]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[44]  Jacob E. Fromm,et al.  Computer Experiments in Fluid Dynamics , 1965 .

[45]  Hideo Hirose,et al.  Prediction of Infectious Disease Spread Using Twitter: A Case of Influenza , 2012, 2012 Fifth International Symposium on Parallel Architectures, Algorithms and Programming.

[46]  Brian H. Spitzberg,et al.  The Reliability of Tweets as a Supplementary Method of Seasonal Influenza Surveillance , 2014, Journal of medical Internet research.

[47]  Anette Hulth,et al.  Head Lice Surveillance on a Deregulated OTC-Sales Market: A Study Using Web Query Data , 2012, PloS one.

[48]  Piotr Gawrysiak,et al.  Using Web Mining for Discovering Spatial Patterns and Hot Spots for Spatial Generalization , 2012, ISMIS.

[49]  M. Marathe,et al.  Modeling the Impact of Interventions on an Epidemic of Ebola in Sierra Leone and Liberia , 2014, PLoS currents.

[50]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[51]  V. Dukic,et al.  Internet Queries and Methicillin-Resistant Staphylococcus aureus Surveillance , 2011, Emerging infectious diseases.

[52]  Andrea A Cortinois,et al.  Stories From the Field: The Use of Information and Communication Technologies to Address the Health Needs of Underserved Populations in Latin America and the Caribbean , 2015, JMIR public health and surveillance.

[53]  H. Andrew Schwartz,et al.  Action Tweets Linked to Reduced County-Level HIV Prevalence in the United States: Online Messages and Structural Determinants , 2016, AIDS and Behavior.

[54]  H. Eugene Stanley,et al.  Provided for non-commercial research and education use . Not for reproduction , distribution or commercial use , 2009 .

[55]  Rumi Chunara,et al.  Online reporting for malaria surveillance using micro-monetary incentives, in urban India 2010-2011 , 2012, Malaria Journal.

[56]  Y-S Chang,et al.  Google unveils a glimpse of allergic rhinitis in the real world , 2015, Allergy.

[57]  Anette Hulth,et al.  Detecting the Norovirus Season in Sweden Using Search Engine Data – Meeting the Needs of Hospital Infection Control Teams , 2014, PloS one.

[58]  S. Rutherford,et al.  Using Google Trends for Influenza Surveillance in South China , 2013, PloS one.

[59]  Alberto Maria Segre,et al.  The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic , 2011, PloS one.

[60]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[61]  Gábor Péter,et al.  Bandoniozyma gen. nov., a Genus of Fermentative and Non-Fermentative Tremellaceous Yeast Species , 2012, PloS one.

[62]  Xi-chuan Zhou,et al.  Notifiable infectious disease surveillance with data collected by search engine , 2010, Journal of Zhejiang University SCIENCE C.

[63]  Leah J. Martin,et al.  Improving Google Flu Trends Estimates for the United States through Transformation , 2014, PloS one.

[64]  Mehmet Tan,et al.  Prediction of influenza outbreaks by integrating Wikipedia article access logs and Google flu trend data , 2015, 2015 IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE).

[65]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[66]  Cipriano Galindo,et al.  Guest-Editorial: Computer-Based Intelligent Technologies for Improving the Quality of Life , 2015, IEEE J. Biomed. Health Informatics.

[67]  John M. Chambers,et al.  Computers in Statistical Research: Simulation and Computer-Aided Mathematics , 1970 .

[68]  Tobias Preis,et al.  Adaptive nowcasting of influenza outbreaks using Google searches , 2014, Royal Society Open Science.

[69]  Brian H. Spitzberg,et al.  The Complex Relationship of Realspace Events and Messages in Cyberspace: Case Study of Influenza and Pertussis Using Tweets , 2013, Journal of medical Internet research.

[70]  M. Smolinski,et al.  Flu Near You: An Online Self-reported Influenza Surveillance System in the USA , 2013, Online Journal of Public Health Informatics.

[71]  A. Hulth,et al.  Web Queries as a Source for Syndromic Surveillance , 2009, PloS one.

[72]  Nello Cristianini,et al.  Tracking the flu pandemic by monitoring the social web , 2010, 2010 2nd International Workshop on Cognitive Information Processing.

[73]  James A Gillespie,et al.  Internet Search Patterns of Human Immunodeficiency Virus and the Digital Divide in the Russian Federation: Infoveillance Study , 2013, Journal of medical Internet research.

[74]  Ronald E. Rice,et al.  Influences, usage, and outcomes of Internet health information searching: Multivariate results from the Pew surveys , 2006, Int. J. Medical Informatics.

[75]  Maaret Castrén,et al.  Forecasting emergency department visits using internet data. , 2015, Annals of emergency medicine.

[76]  Daniel J. Bachmann,et al.  Biosurveillance: A Review and Update , 2012, Advances in preventive medicine.

[77]  M. Osborne,et al.  Using Prediction Markets and Twitter to Predict a Swine Flu Pandemic , 2009 .

[78]  J. D. de Wolff,et al.  An Evaluation of Wikipedia as a Resource for Patient Education in Nephrology , 2013, Seminars in dialysis.

[79]  A. Gumel,et al.  Emergency department and ‘Google flu trends’ data as syndromic surveillance indicators for seasonal influenza , 2014, Epidemiology and Infection.

[80]  Gunther Eysenbach,et al.  Infodemiology: Tracking Flu-Related Searches on the Web for Syndromic Surveillance , 2006, AMIA.

[81]  Eduardo Massad,et al.  Threat of Dengue to Blood Safety in Dengue-Endemic Countries , 2009, Emerging infectious diseases.

[82]  A Hulth,et al.  Web query-based surveillance in Sweden during the influenza A(H1N1)2009 pandemic, April 2009 to February 2010. , 2011, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[83]  M. Osborne,et al.  Bieber no more : First Story Detection using Twitter and Wikipedia , 2012 .

[84]  Ning Wang,et al.  Wikipedia and Stock Return: Wikipedia Usage Pattern Helps to Predict the Individual Stock Movement , 2016, WWW.

[85]  Mark S Dworkin,et al.  Categorization, prioritization, and surveillance of potential bioterrorism agents. , 2006, Infectious disease clinics of North America.

[86]  T. Frieden A framework for public health action: the health impact pyramid. , 2010, American journal of public health.

[87]  A. Hagihara,et al.  Internet suicide searches and the incidence of suicide in young people in Japan , 2011, European Archives of Psychiatry and Clinical Neuroscience.

[88]  Wendy W. Chapman,et al.  Analysis of Web Access Logs for Surveillance of Influenza , 2004, MedInfo.

[89]  Emily H. Chan,et al.  Using Web Search Query Data to Monitor Dengue Epidemics: A New Model for Neglected Tropical Disease Surveillance , 2011, PLoS neglected tropical diseases.

[90]  A. Flahault,et al.  More Diseases Tracked by Using Google Trends , 2009, Emerging infectious diseases.

[91]  Michael J. Paul,et al.  Using Social Media to Perform Local Influenza Surveillance in an Inner-City Hospital: A Retrospective Observational Study , 2015, JMIR public health and surveillance.

[92]  H. Eugene Stanley,et al.  Quantifying Wikipedia Usage Patterns Before Stock Market Moves , 2013, Scientific Reports.

[93]  Virgílio A. F. Almeida,et al.  Dengue surveillance based on a computational model of spatio-temporal locality of Twitter , 2011, WebSci '11.

[94]  Marijn ten Thij,et al.  Modeling and predicting page-view dynamics on Wikipedia , 2012, ArXiv.

[95]  Chris Callison-Burch,et al.  WikiTopics: What is Popular on Wikipedia and Why , 2011 .

[96]  Jieping Ye,et al.  Dynamic Poisson Autoregression for Influenza-Like-Illness Case Count Prediction , 2015, KDD.

[97]  Reinhard Windhager,et al.  Wikipedia and osteosarcoma: a trustworthy patients' information? , 2010, J. Am. Medical Informatics Assoc..

[98]  Paola Velardi,et al.  Can Twitter Be a Source of Information on Allergy? Correlation of Pollen Counts with Tweets Reporting Symptoms of Allergic Rhinoconjunctivitis and Names of Antihistamine Drugs , 2015, PloS one.

[99]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[100]  Brent J. Hecht,et al.  WikiBrain: Democratizing computation on Wikipedia , 2014, OpenSym.

[101]  Aron Culotta,et al.  Towards detecting influenza epidemics by analyzing Twitter messages , 2010, SOMA '10.

[102]  Peter Christen,et al.  Cross Language Prediction of Vandalism on Wikipedia Using Article Views and Revisions , 2013, PAKDD.

[103]  Jedsada Chartree,et al.  Monitoring Dengue Outbreaks Using Online Data , 2014 .

[104]  Jian Ma,et al.  A neural netwok based approach to detect influenza epidemics using search engine query data , 2010, 2010 International Conference on Machine Learning and Cybernetics.

[105]  Kate Faasse,et al.  Public Anxiety and Information Seeking Following the H1N1 Outbreak: Blogs, Newspaper Articles, and Wikipedia Visits , 2012, Health communication.

[106]  Khalid Mahmood,et al.  Cloud service for assessment of news' Popularity in internet based on Google and Wikipedia indicators , 2015, 2015 5th National Symposium on Information Technology: Towards New Smart World (NSITNSW).

[107]  Huaxia Rui,et al.  Nowcasting Obesity in the U.S. Using Google Search Volume Data , 2014 .

[108]  Michaël,et al.  Seeking health information online: does Wikipedia matter? , 2009, Journal of the American Medical Informatics Association : JAMIA.

[109]  Natalie Kupferberg,et al.  Accuracy and completeness of drug information in Wikipedia: an assessment. , 2011, Journal of the Medical Library Association : JMLA.

[110]  S. Rasmussen,et al.  Zika Virus and Birth Defects--Reviewing the Evidence for Causality. , 2016, The New England journal of medicine.

[111]  Tao Liu,et al.  Early detection of an epidemic erythromelalgia outbreak using Baidu search data , 2015, Scientific Reports.

[112]  Andreas Dengel,et al.  Analysis and forecasting of trending topics in online media streams , 2013, ACM Multimedia.

[113]  Loubiela Joseph,et al.  Diagnostic Performance of Zika Virus Ribonucleic Acid (RNA) Polymerase Chain Reaction (PCR) in Urine Samples , 2016 .

[114]  Wenli Zhang,et al.  Predicting Asthma-Related Emergency Department Visits Using Big Data , 2015, IEEE Journal of Biomedical and Health Informatics.

[115]  Michael J. Paul,et al.  National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic , 2013, PloS one.

[116]  Son Doan,et al.  BioCaster: detecting public health rumors with a Web-based text mining system , 2008, Bioinform..

[117]  Eleftherios Mylonakis,et al.  Google trends: a web-based tool for real-time surveillance of disease outbreaks. , 2009, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[118]  Ryen W. White,et al.  Screening for Pancreatic Adenocarcinoma Using Signals From Web Search Logs: Feasibility Study and Results. , 2016, Journal of oncology practice.

[119]  A. McMichael,et al.  Globalization, climate change, and human health. , 2013, The New England journal of medicine.

[120]  Brian de Silva,et al.  Prediction of Foreign Box Office Revenues Based on Wikipedia Page Activity , 2014, ArXiv.

[121]  Alessio Signorini,et al.  Use of social media to monitor and predict outbreaks and public opinion on health topics , 2014 .

[122]  A. Hulth,et al.  Syndromic surveillance of influenza activity in Sweden: an evaluation of three tools , 2014, Epidemiology and Infection.

[123]  Madhav V. Marathe,et al.  Forecasting a Moving Target: Ensemble Models for ILI Case Count Predictions , 2014, SDM.

[124]  Yiqun Liu,et al.  Predicting Epidemic Tendency through Search Behavior Analysis , 2011, IJCAI.

[125]  Wenli Zhang,et al.  Extracting Signals from Social Media for Chronic Disease Surveillance , 2016, Digital Health.

[126]  Matjaz Omladic,et al.  What can Wikipedia and Google tell us about stock prices under different market regimes? , 2015, Ars Math. Contemp..

[127]  Jang Seok Oh,et al.  Use of Hangeul Twitter to Track and Predict Human Influenza Infection , 2013, PloS one.

[128]  David M. Pennock,et al.  Using internet searches for influenza surveillance. , 2008, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[129]  John Riedl,et al.  Creating, destroying, and restoring value in wikipedia , 2007, GROUP.

[130]  J. Shaman,et al.  Forecasting seasonal outbreaks of influenza , 2012, Proceedings of the National Academy of Sciences.

[131]  Li Lu,et al.  Prediction of influenza epidemics at the province level in China using search query from “Haosou” , 2015, 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[132]  J. Brownstein,et al.  Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak. , 2012, The American journal of tropical medicine and hygiene.

[133]  D. Buckeridge,et al.  Systematic Review: Surveillance Systems for Early Detection of Bioterrorism-Related Diseases , 2004, Annals of Internal Medicine.

[134]  Kenneth D. Mandl,et al.  HealthMap: Global Infectious Disease Monitoring through Automated Classification and Visualization of Internet Media Reports , 2008, Journal of the American Medical Informatics Association.

[135]  José Cola Zanuncio,et al.  Rice-Straw Mulch Reduces the Green Peach Aphid, Myzus persicae (Hemiptera: Aphididae) Populations on Kale, Brassica oleracea var. acephala (Brassicaceae) Plants , 2014, PloS one.

[136]  Liaquat Hossain,et al.  Towards Early Detection of Influenza Epidemics by Using Social Media Analytics , 2014, DSS.

[137]  Kok Wah Ng The use of Twitter to predict the level of influenza activity in the United States , 2014 .

[138]  Daniela Amicizia,et al.  Assessing Ebola-related web search behaviour: insights and implications from an analytical study of Google Trends-based query volumes , 2015 .

[139]  Yossi Matias,et al.  Norovirus disease surveillance using Google Internet query share data. , 2012, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[140]  J. Ayers,et al.  Seasonality in seeking mental health information on Google. , 2013, American journal of preventive medicine.

[141]  Thanassis Tiropanis,et al.  An approach for using Wikipedia to measure the flow of trends across countries , 2013, WWW.

[142]  Benyuan Liu,et al.  Twitter Improves Seasonal Influenza Prediction , 2018, HEALTHINF.

[143]  Susan M. Mniszewski,et al.  Understanding the Impact of Face Mask Usage Through Epidemic Simulation of Large Social Networks , 2013, Theories and Simulations of Complex Social Systems.

[144]  J. Brownstein,et al.  A Case Study of the New York City 2012-2013 Influenza Season With Daily Geocoded Twitter Data From Temporal and Spatiotemporal Perspectives , 2014, Journal of medical Internet research.

[145]  Aron Culotta,et al.  Estimating county health statistics with twitter , 2014, CHI.

[146]  J. Brownstein,et al.  Early detection of disease outbreaks using the Internet , 2009, Canadian Medical Association Journal.

[147]  Fan Zhang,et al.  Laboratory Surge Response to Pandemic (H1N1) 2009 Outbreak, New York City Metropolitan Area, USA , 2010, Emerging infectious diseases.

[148]  Ian H. Witten,et al.  An open-source toolkit for mining Wikipedia , 2013, Artif. Intell..

[149]  Han Zhao,et al.  Monitoring Epidemic Alert Levels by Analyzing Internet Search Volume , 2013, IEEE Transactions on Biomedical Engineering.

[150]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[151]  Crystale Purvis Cooper,et al.  Cancer Internet Search Activity on a Major Search Engine, United States 2001-2003 , 2005, Journal of medical Internet research.

[152]  David T. Plante,et al.  Seasonal trends in tinnitus symptomatology: evidence from Internet search engine query data , 2015, European Archives of Oto-Rhino-Laryngology.

[153]  Y. Gel,et al.  Influenza Forecasting with Google Flu Trends , 2013, PloS one.

[154]  Aron Culotta,et al.  Lightweight methods to estimate influenza rates and alcohol sales volume from Twitter messages , 2012, Language Resources and Evaluation.

[155]  Alina Deshpande,et al.  Global Disease Monitoring and Forecasting with Wikipedia , 2014, PLoS Comput. Biol..

[156]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[157]  Marc Lipsitch,et al.  Inference of seasonal and pandemic influenza transmission dynamics , 2015, Proceedings of the National Academy of Sciences.

[158]  Cécile Viboud,et al.  Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic Influenza: A Comparative Epidemiological Study at Three Geographic Scales , 2013, PLoS Comput. Biol..

[159]  Shengli Li,et al.  Research of the Correlation between the H1N1 Morbidity Data and Google Trends in Egypt , 2015, ArXiv.

[160]  Son Doan,et al.  Enhancing Twitter Data Analysis with Simple Semantic Filtering: Example in Tracking Influenza-Like Illnesses , 2012, 2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology.

[161]  Marijn ten Thij,et al.  Modelling page-view dynamics on Wikipedia , 2013 .

[162]  M. Santillana,et al.  What can digital disease detection learn from (an external revision to) Google Flu Trends? , 2014, American journal of preventive medicine.

[163]  Benyuan Liu,et al.  Predicting Flu Trends using Twitter data , 2011, 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[164]  Paola Velardi,et al.  Predicting Flu Epidemics Using Twitter and Historical Data , 2014, Brain Informatics and Health.

[165]  L. R. Petersen,et al.  Zika Virus. , 2016, The New England journal of medicine.

[166]  Joseph Bernstein,et al.  Quality of information on the Internet about carpal tunnel syndrome: an update. , 2013, Orthopedics.

[167]  Brian E. Granger,et al.  IPython: A System for Interactive Scientific Computing , 2007, Computing in Science & Engineering.

[168]  M. Janal,et al.  Effect of Twice-Daily Blue Light Treatment on Matrix-Rich Biofilm Development , 2015, PloS one.

[169]  D. Cummings,et al.  Prediction of Dengue Incidence Using Search Query Surveillance , 2011, PLoS neglected tropical diseases.

[170]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[171]  Ralph Mösges,et al.  A Model for the Determination of Pollen Count Using Google Search Queries for Patients Suffering from Allergic Rhinitis , 2014, Journal of allergy.

[172]  Soo-Yong Shin,et al.  Cumulative Query Method for Influenza Surveillance Using Search Engine Data , 2014, Journal of medical Internet research.

[173]  Taha Yasseri,et al.  Wikipedia traffic data and electoral prediction: towards theoretically informed models , 2016, EPJ Data Science.

[174]  Taha Yasseri,et al.  Early Prediction of Movie Box Office Success Based on Wikipedia Activity Big Data , 2012, PloS one.

[175]  E. Nsoesie,et al.  Monitoring Influenza Epidemics in China with Search Query from Baidu , 2013, PloS one.

[176]  A. Dicker,et al.  Patient-oriented cancer information on the internet: a comparison of wikipedia and a professionally maintained database. , 2011, Journal of oncology practice.

[177]  S. Groseclose,et al.  Completeness of notifiable infectious disease reporting in the United States: an analytical literature review. , 2002, American journal of epidemiology.

[178]  Hye-Joo Kim,et al.  Peramivir use for treatment of hospitalized patients with influenza A(H1N1)pdm09 under emergency use authorization, October 2009-June 2010. , 2012, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[179]  Declan Butler,et al.  When Google got flu wrong , 2013, Nature.

[180]  H. Eugene Stanley,et al.  Nonlinear phenomena in complex systems : from nano to macro scale , 2014 .

[181]  Michał Bogdziewicz,et al.  Oak acorn crop and Google search volume predict Lyme disease risk in temperate Europe , 2016 .

[182]  Shilu Tong,et al.  Using internet search queries for infectious disease surveillance: screening diseases for suitability , 2014, BMC Infectious Diseases.