Enhancing disease surveillance with novel data streams: challenges and opportunities

Novel data streams (NDS), such as web search data or social media updates, hold promise for enhancing the capabilities of public health surveillance. In this paper, we outline a conceptual framework for integrating NDS into current public health surveillance. Our approach focuses on two key questions: What are the opportunities for using NDS and what are the minimal tests of validity and utility that must be applied when using NDS? Identifying these opportunities will necessitate the involvement of public health authorities and an appreciation of the diversity of objectives and scales across agencies at different levels (local, state, national, international). We present the case that clearly articulating surveillance objectives and systematically evaluating NDS and comparing the performance of NDS to existing surveillance data and alternative NDS data is critical and has not sufficiently been addressed in many applications of NDS currently in the literature.

[1]  E. Nsoesie,et al.  Monitoring Influenza Epidemics in China with Search Query from Baidu , 2013, PloS one.

[2]  A. Hagihara,et al.  Internet suicide searches and the incidence of suicide in young people in Japan , 2011, European Archives of Psychiatry and Clinical Neuroscience.

[3]  Mark Dredze,et al.  Separating Fact from Fear: Tracking Flu Infections on Twitter , 2013, NAACL.

[4]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[5]  A Hulth,et al.  Web query-based surveillance in Sweden during the influenza A(H1N1)2009 pandemic, April 2009 to February 2010. , 2011, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[6]  Crystale Purvis Cooper,et al.  Cancer Internet Search Activity on a Major Search Engine, United States 2001-2003 , 2005, Journal of medical Internet research.

[7]  Mark Dredze,et al.  Population health concerns during the United States' Great Recession. , 2014, American journal of preventive medicine.

[8]  J. Brownstein,et al.  Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak. , 2012, The American journal of tropical medicine and hygiene.

[9]  Armin R. Mikler,et al.  Text and Structural Data Mining of Influenza Mentions in Web and Social Media , 2010, International journal of environmental research and public health.

[10]  Gunther Eysenbach,et al.  Infodemiology: Tracking Flu-Related Searches on the Web for Syndromic Surveillance , 2006, AMIA.

[11]  Jieping Ye,et al.  Tuberculosis Surveillance by Analyzing Google Trends , 2011, IEEE Transactions on Biomedical Engineering.

[12]  Mizuki Morita,et al.  Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter , 2011, EMNLP.

[13]  S. Rutherford,et al.  Using Google Trends for Influenza Surveillance in South China , 2013, PloS one.

[14]  C. Peng,et al.  Association of Internet search trends with suicide death in Taipei City, Taiwan, 2004-2009. , 2011, Journal of affective disorders.

[15]  Benyuan Liu,et al.  Predicting Flu Trends using Twitter data , 2011, 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[16]  V. Dukic,et al.  Internet Queries and Methicillin-Resistant Staphylococcus aureus Surveillance , 2011, Emerging infectious diseases.

[17]  Ś. Sen,et al.  Use of Google Insights for Search to track seasonal and geographic kidney stone incidence in the United States. , 2011, Urology.

[18]  Mizuki Morita,et al.  Influenza Patients Are Invisible in the Web: Traditional Model Still Improves the State of the Art Web Based Influenza Surveillance , 2012, AAAI Spring Symposium: Self-Tracking and Collective Intelligence for Personal Wellness.

[19]  Kelley G. Chester BioSense 2.0 , 2013, Online Journal of Public Health Informatics.

[20]  D. Cummings,et al.  Prediction of Dengue Incidence Using Search Query Surveillance , 2011, PLoS neglected tropical diseases.

[21]  Y. Gel,et al.  Influenza Forecasting with Google Flu Trends , 2013, PloS one.

[22]  Jae Ho Lee,et al.  Correlation between National Influenza Surveillance Data and Google Trends in South Korea , 2013, PloS one.

[23]  Yossi Matias,et al.  Norovirus disease surveillance using Google Internet query share data. , 2012, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[24]  Wendy W. Chapman,et al.  Analysis of Web Access Logs for Surveillance of Influenza , 2004, MedInfo.

[25]  A. Hulth,et al.  Web Queries as a Source for Syndromic Surveillance , 2009, PloS one.

[26]  Li Na,et al.  Gonorrhea incidence forecasting research based on Baidu search data , 2013, 2013 International Conference on Management Science and Engineering 20th Annual Conference Proceedings.

[27]  Brian H. Spitzberg,et al.  The Complex Relationship of Realspace Events and Messages in Cyberspace: Case Study of Influenza and Pertussis Using Tweets , 2013, Journal of medical Internet research.

[28]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[29]  Cécile Viboud,et al.  Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic Influenza: A Comparative Epidemiological Study at Three Geographic Scales , 2013, PLoS Comput. Biol..

[30]  Yiqun Liu,et al.  Predicting Epidemic Tendency through Search Behavior Analysis , 2011, IJCAI.

[31]  F. Nelson,et al.  Use of prediction markets to forecast infectious disease activity. , 2007, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[32]  J. Brownstein,et al.  Novel surveillance of psychological distress during the great recession. , 2012, Journal of affective disorders.

[33]  C. Peng,et al.  Do Seasons Have an Influence on the Incidence of Depression? The Use of an Internet Search Engine Query Data as a Proxy of Human Affect , 2010, PloS one.

[34]  John S. Brownstein,et al.  Using Search Query Surveillance to Monitor Tax Avoidance and Smoking Cessation following the United States' 2009 “SCHIP” Cigarette Tax Increase , 2011, PloS one.

[35]  J. Brownstein,et al.  Early detection of disease outbreaks using the Internet , 2009, Canadian Medical Association Journal.

[36]  Emily H. Chan,et al.  Using Web Search Query Data to Monitor Dengue Epidemics: A New Model for Neglected Tropical Disease Surveillance , 2011, PLoS neglected tropical diseases.

[37]  B. Nahed,et al.  Determination of geographic variance in stroke prevalence using Internet search engine analytics. , 2011, Neurosurgical focus.

[38]  Eleftherios Mylonakis,et al.  Google trends: a web-based tool for real-time surveillance of disease outbreaks. , 2009, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[39]  F. Mostashari,et al.  Monitoring over-the-counter medication sales for early detection of disease outbreaks--New York City. , 2005, MMWR supplements.

[40]  James A Gillespie,et al.  Internet Search Patterns of Human Immunodeficiency Virus and the Digital Divide in the Russian Federation: Infoveillance Study , 2013, Journal of medical Internet research.

[41]  Anette Hulth,et al.  Head Lice Surveillance on a Deregulated OTC-Sales Market: A Study Using Web Query Data , 2012, PloS one.

[42]  M. Smolinski,et al.  Flu Near You: An Online Self-reported Influenza Surveillance System in the USA , 2013, Online Journal of Public Health Informatics.

[43]  Son Doan,et al.  Enhancing Twitter Data Analysis with Simple Semantic Filtering: Example in Tracking Influenza-Like Illnesses , 2012, 2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology.

[44]  J. Ayers,et al.  Seasonality in seeking mental health information on Google. , 2013, American journal of preventive medicine.

[45]  Gil-Young Song,et al.  Predicting National Suicide Numbers with Social Media Data , 2013, PloS one.

[46]  E. Nsoesie,et al.  Guess Who’s Not Coming to Dinner? Evaluating Online Restaurant Reservations for Disease Surveillance , 2014, Journal of medical Internet research.

[47]  Mark Dredze,et al.  Could behavioral medicine lead the web data revolution? , 2014, JAMA.

[48]  David H. Wolpert,et al.  Ubiquity symposium: Evolutionary computation and the processes of life: what the no free lunch theorems really mean: how to improve search algorithms , 2013, UBIQ.

[49]  John S. Brownstein,et al.  Wikipedia Usage Estimates Prevalence of Influenza-Like Illness in the United States in Near Real-Time , 2014, PLoS Comput. Biol..

[50]  Aron Culotta,et al.  Lightweight methods to estimate influenza rates and alcohol sales volume from Twitter messages , 2012, Language Resources and Evaluation.

[51]  William B. Lober,et al.  Applying a New Model for Sharing Population Health Data to National Syndromic Influenza Surveillance: DiSTRIBuTE Project Proof of Concept, 2006 to 2009 , 2011, PLoS currents.

[52]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[53]  Sérgio Matos,et al.  Predicting Flu Incidence from Portuguese Tweets , 2013, IWBBIO.

[54]  Benyuan Liu,et al.  Twitter Improves Seasonal Influenza Prediction , 2018, HEALTHINF.

[55]  Alina Deshpande,et al.  Global Disease Monitoring and Forecasting with Wikipedia , 2014, PLoS Comput. Biol..

[56]  K. Ribisl,et al.  Digital detection for tobacco control: online reactions to the 2009 U.S. cigarette excise tax increase. , 2013, Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco.

[57]  Ayers,et al.  Digital Detection for tobacco Control: Online reactions to the United states’ 2009 Cigarette excise tax increase , 2013 .

[58]  Seth M Noar,et al.  Do celebrity cancer diagnoses promote primary cancer prevention? , 2014, Preventive medicine.

[59]  Jessica Fitts Willoughby,et al.  Using digital surveillance to examine the impact of public figure pancreatic cancer announcements on media and search query outcomes. , 2013, Journal of the National Cancer Institute. Monographs.

[60]  Luis Gravano,et al.  Using Online Reviews by Restaurant Patrons to Identify Unreported Cases of Foodborne Illness — New York City, 2012–2013 , 2014, MMWR. Morbidity and mortality weekly report.

[61]  Gail M Williams,et al.  Internet-based surveillance systems for monitoring emerging infectious diseases , 2013, The Lancet Infectious Diseases.

[62]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[63]  J. Aucott,et al.  The utility of "Google Trends" for epidemiological research: Lyme disease as an example. , 2010, Geospatial health.

[64]  Anette Hulth,et al.  Eye-Opening Approach to Norovirus Surveillance , 2010, Emerging infectious diseases.

[65]  A. Flahault,et al.  More Diseases Tracked by Using Google Trends , 2009, Emerging infectious diseases.

[66]  Avinash R. Patwardhan,et al.  Comparison: Flu Prescription Sales Data from a Retail Pharmacy in the US with Google Flu Trends and US ILINet (CDC) Data as Flu Activity Indicator , 2012, PloS one.

[67]  D. Ingram,et al.  Seasonal trends in restless legs symptomatology: evidence from Internet search query data. , 2013, Sleep medicine.

[68]  E. Nsoesie,et al.  Using Clinicians’ Search Query Data to Monitor Influenza Epidemics , 2014, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[69]  Kenneth D. Mandl,et al.  HealthMap: Global Infectious Disease Monitoring through Automated Classification and Visualization of Internet Media Reports , 2008, Journal of the American Medical Informatics Association.

[70]  Padhraic Smyth,et al.  Stacked Density Estimation , 1997, NIPS.

[71]  Alicia Karspeck,et al.  Real-Time Influenza Forecasts during the 2012–2013 Season , 2013, Nature Communications.

[72]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[73]  Nello Cristianini,et al.  Nowcasting Events from the Social Web with Statistical Learning , 2012, TIST.

[74]  David M. Pennock,et al.  Using internet searches for influenza surveillance. , 2008, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[75]  J. Brownstein,et al.  Using search queries for malaria surveillance, Thailand , 2013, Malaria Journal.

[76]  A Vespignani,et al.  Web‐based participatory surveillance of infectious diseases: the Influenzanet participatory surveillance experience , 2013, Clinical Microbiology and Infection.

[77]  R. G. Parrish,et al.  Guidelines for evaluating surveillance systems. , 1988, MMWR supplements.

[78]  Hideo Hirose,et al.  Prediction of Infectious Disease Spread Using Twitter: A Case of Influenza , 2012, 2012 Fifth International Symposium on Parallel Architectures, Algorithms and Programming.

[79]  Alberto Maria Segre,et al.  The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic , 2011, PloS one.

[80]  Jian Ma,et al.  A neural netwok based approach to detect influenza epidemics using search engine query data , 2010, 2010 International Conference on Machine Learning and Cybernetics.

[81]  M. Santillana,et al.  What can digital disease detection learn from (an external revision to) Google Flu Trends? , 2014, American journal of preventive medicine.

[82]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[83]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[84]  Michael J. Paul,et al.  National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic , 2013, PloS one.

[85]  G. Eysenbach,et al.  Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak , 2010, PloS one.

[86]  Nello Cristianini,et al.  Tracking the flu pandemic by monitoring the social web , 2010, 2010 2nd International Workshop on Cognitive Information Processing.

[87]  Nedialko B. Dimitrov,et al.  Optimizing Provider Recruitment for Influenza Surveillance Networks , 2012, PLoS Comput. Biol..

[88]  Xi-chuan Zhou,et al.  Notifiable infectious disease surveillance with data collected by search engine , 2010, Journal of Zhejiang University SCIENCE C.

[89]  Virgílio A. F. Almeida,et al.  Dengue surveillance based on a computational model of spatio-temporal locality of Twitter , 2011, WebSci '11.

[90]  Mark Dredze,et al.  What's the healthiest day?: Circaseptan (weekly) rhythms in healthy considerations. , 2014, American journal of preventive medicine.

[91]  Jang Seok Oh,et al.  Use of Hangeul Twitter to Track and Predict Human Influenza Infection , 2013, PloS one.