Cumulative Query Method for Influenza Surveillance Using Search Engine Data

Background Internet search queries have become an important data source in syndromic surveillance system. However, there is currently no syndromic surveillance system using Internet search query data in South Korea. Objectives The objective of this study was to examine correlations between our cumulative query method and national influenza surveillance data. Methods Our study was based on the local search engine, Daum (approximately 25% market share), and influenza-like illness (ILI) data from the Korea Centers for Disease Control and Prevention. A quota sampling survey was conducted with 200 participants to obtain popular queries. We divided the study period into two sets: Set 1 (the 2009/10 epidemiological year for development set 1 and 2010/11 for validation set 1) and Set 2 (2010/11 for development Set 2 and 2011/12 for validation Set 2). Pearson’s correlation coefficients were calculated between the Daum data and the ILI data for the development set. We selected the combined queries for which the correlation coefficients were .7 or higher and listed them in descending order. Then, we created a cumulative query method n representing the number of cumulative combined queries in descending order of the correlation coefficient. Results In validation set 1, 13 cumulative query methods were applied, and 8 had higher correlation coefficients (min=.916, max=.943) than that of the highest single combined query. Further, 11 of 13 cumulative query methods had an r value of ≥.7, but 4 of 13 combined queries had an r value of ≥.7. In validation set 2, 8 of 15 cumulative query methods showed higher correlation coefficients (min=.975, max=.987) than that of the highest single combined query. All 15 cumulative query methods had an r value of ≥.7, but 6 of 15 combined queries had an r value of ≥.7. Conclusions Cumulative query method showed relatively higher correlation with national influenza surveillance data than combined queries in the development and validation set.

[1]  J. Hirshon,et al.  The rationale for developing public health surveillance systems based on emergency department data. , 2000, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[2]  F. Ellis McKenzie,et al.  Influenza Forecasting in Human Populations: A Scoping Review , 2014, PloS one.

[3]  T. Bernardo,et al.  Scoping Review on Search Queries and Social Media for Disease Surveillance: A Chronology of Innovation , 2013, Journal of medical Internet research.

[4]  David L. Buckeridge,et al.  Usefulness of School Absenteeism Data for Predicting Influenza Outbreaks, United States , 2012, Emerging infectious diseases.

[5]  E. Gabrilovich,et al.  Postmarket Drug Surveillance Without Trial Costs: Discovery of Adverse Drug Reactions Through Large-Scale Analysis of Web Search Queries , 2013, Journal of medical Internet research.

[6]  Gunther Eysenbach,et al.  Infodemiology and infoveillance tracking online health information and cyberbehavior for public health. , 2011, American journal of preventive medicine.

[7]  D. Scammon,et al.  Incidence of Online Health Information Search: A Useful Proxy for Public Health Risk Perception , 2013, Journal of medical Internet research.

[8]  Henrik Eriksson,et al.  Performance of eHealth Data Sources in Local Influenza Surveillance: A 5-Year Open Cohort Study , 2014, Journal of medical Internet research.

[9]  David M. Pennock,et al.  Using internet searches for influenza surveillance. , 2008, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[10]  G. Eysenbach Infodemiology and Infoveillance: Framework for an Emerging Set of Public Health Informatics Methods to Analyze Search, Communication and Publication Behavior on the Internet , 2009, Journal of medical Internet research.

[11]  Umar Saif,et al.  FluBreaks: early epidemic detection from Google flu trends. , 2012, Journal of medical Internet research.

[12]  James A Gillespie,et al.  Searching for Truth: Internet Search Patterns as a Method of Investigating Online Responses to a Russian Illicit Drug Policy Debate , 2012, Journal of medical Internet research.

[13]  Lai Ming Ho,et al.  Electronic School Absenteeism Monitoring and Influenza Surveillance, Hong Kong , 2012, Emerging infectious diseases.

[14]  Anna García-Altés,et al.  Health Services Utilization, Work Absenteeism and Costs of Pandemic Influenza A (H1N1) 2009 in Spain: A Multicenter-Longitudinal Study , 2012, PloS one.

[15]  Trevor Strome,et al.  “Google Flu Trends” and Emergency Department Triage Data Predicted the 2009 Pandemic H1N1 Waves in Manitoba , 2011, Canadian journal of public health = Revue canadienne de sante publique.

[16]  Avinash R. Patwardhan,et al.  Comparison: Flu Prescription Sales Data from a Retail Pharmacy in the US with Google Flu Trends and US ILINet (CDC) Data as Flu Activity Indicator , 2012, PloS one.

[17]  Jennifer Keelan,et al.  Natural Supplements for H1N1 Influenza: Retrospective Observational Infodemiology Study of Information and Search Activity on the Internet , 2011, Journal of medical Internet research.

[18]  Jae Ho Lee,et al.  Correlation between National Influenza Surveillance Data and Google Trends in South Korea , 2013, PloS one.

[19]  S. Rutherford,et al.  Using Google Trends for Influenza Surveillance in South China , 2013, PloS one.

[20]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[21]  A. Nizam,et al.  Containing Pandemic Influenza at the Source , 2005, Science.

[22]  S. Triple,et al.  Assessment of syndromic surveillance in Europe. , 2011 .

[23]  D. Cummings,et al.  Strategies for containing an emerging influenza pandemic in Southeast Asia , 2005, Nature.

[24]  M. Vicente,et al.  Monitoring influenza activity in Europe with Google Flu Trends: comparison with the findings of sentinel physician networks - results for 2009-10. , 2010, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[25]  Matthew Mohebbi,et al.  Assessing Google Flu Trends Performance in the United States during the 2009 Influenza Virus A (H1N1) Pandemic , 2011, PloS one.

[26]  C. Peng,et al.  Do Seasons Have an Influence on the Incidence of Depression? The Use of an Internet Search Engine Query Data as a Proxy of Human Affect , 2010, PloS one.

[27]  C. Goss,et al.  Monitoring Influenza Activity in the United States: A Comparison of Traditional Surveillance Systems with Google Flu Trends , 2011, PloS one.