FluBreaks: early epidemic detection from Google flu trends.

Background The Google Flu Trends service was launched in 2008 to track changes in the volume of online search queries related to flu-like symptoms. Over the last few years, the trend data produced by this service has shown a consistent relationship with the actual number of flu reports collected by the US Centers for Disease Control and Prevention (CDC), often identifying increases in flu cases weeks in advance of CDC records. However, contrary to popular belief, Google Flu Trends is not an early epidemic detection system. Instead, it is designed as a baseline indicator of the trend, or changes, in the number of disease cases. Objective To evaluate whether these trends can be used as a basis for an early warning system for epidemics. Methods We present the first detailed algorithmic analysis of how Google Flu Trends can be used as a basis for building a fully automated system for early warning of epidemics in advance of methods used by the CDC. Based on our work, we present a novel early epidemic detection system, called FluBreaks (dritte.org/flubreaks), based on Google Flu Trends data. We compared the accuracy and practicality of three types of algorithms: normal distribution algorithms, Poisson distribution algorithms, and negative binomial distribution algorithms. We explored the relative merits of these methods, and related our findings to changes in Internet penetration and population size for the regions in Google Flu Trends providing data. Results Across our performance metrics of percentage true-positives (RTP), percentage false-positives (RFP), percentage overlap (OT), and percentage early alarms (EA), Poisson- and negative binomial-based algorithms performed better in all except RFP. Poisson-based algorithms had average values of 99%, 28%, 71%, and 76% for RTP, RFP, OT, and EA, respectively, whereas negative binomial-based algorithms had average values of 97.8%, 17.8%, 60%, and 55% for RTP, RFP, OT, and EA, respectively. Moreover, the EA was also affected by the region’s population size. Regions with larger populations (regions 4 and 6) had higher values of EA than region 10 (which had the smallest population) for negative binomial- and Poisson-based algorithms. The difference was 12.5% and 13.5% on average in negative binomial- and Poisson-based algorithms, respectively. Conclusions We present the first detailed comparative analysis of popular early epidemic detection algorithms on Google Flu Trends data. We note that realizing this opportunity requires moving beyond the cumulative sum and historical limits method-based normal distribution approaches, traditionally employed by the CDC, to negative binomial- and Poisson-based algorithms to deal with potentially noisy search query data from regions with varying population and Internet penetrations. Based on our work, we have developed FluBreaks, an early warning system for flu epidemics using Google Flu Trends.

[1]  L C Hutwagner,et al.  A simulation model for assessing aberration detection methods used in public health surveillance for systems with limited baselines , 2005, Statistics in medicine.

[2]  Paul H. Garthwaite,et al.  Statistical methods for the prospective detection of infectious disease outbreaks: a review , 2012 .

[3]  Anita M. Pelecanos,et al.  Outbreak detection algorithms for seasonal disease data: a case study using ross river virus disease , 2010, BMC Medical Informatics Decis. Mak..

[4]  D F Stroup,et al.  Detection of aberrations in the occurrence of notifiable diseases surveillance data. , 1989, Statistics in medicine.

[5]  Ś. Sen,et al.  Use of Google Insights for Search to track seasonal and geographic kidney stone incidence in the United States. , 2011, Urology.

[6]  Carlos Castillo Salgado Trends and directions of global public health surveillance , 2010 .

[7]  G. Eysenbach Infodemiology and Infoveillance: Framework for an Emerging Set of Public Health Informatics Methods to Analyze Search, Communication and Publication Behavior on the Internet , 2009, Journal of medical Internet research.

[8]  James M. Lucas,et al.  Counted Data CUSUM's , 1985 .

[9]  L. Hutwagner,et al.  Using laboratory-based surveillance data for prevention: an algorithm for detecting Salmonella outbreaks. , 1997, Emerging infectious diseases.

[10]  John A. Nelder,et al.  Generalized linear models. 2nd ed. , 1993 .

[11]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[12]  Ronald D Fricker,et al.  Comparing syndromic surveillance detection methods: EARS' versus a CUSUM‐based methodology , 2008, Statistics in medicine.

[13]  Lori Hutwagner,et al.  Comparing Aberration Detection Methods with Simulated Data , 2005, Emerging infectious diseases.

[14]  Elisabeth J. Umble,et al.  Cumulative Sum Charts and Charting for Quality Improvement , 2001, Technometrics.

[15]  L. Hutwagner,et al.  The bioterrorism preparedness and response Early Aberration Reporting System (EARS) , 2003, Journal of Urban Health.

[16]  J. Aucott,et al.  The utility of "Google Trends" for epidemiological research: Lyme disease as an example. , 2010, Geospatial health.

[17]  Michelle L Gatton,et al.  Spatial-temporal analysis of Ross River virus disease patterns in Queensland, Australia. , 2004, The American journal of tropical medicine and hygiene.

[18]  Jennifer L. Pomeranz,et al.  The Delivery of Public Health Interventions via the Internet: Actualizing Their Potential , 2011 .

[19]  P. McCullagh,et al.  Generalized Linear Models , 1972, Predictive Analytics.

[20]  David R. Cox,et al.  The statistical analysis of series of events , 1966 .

[21]  Pravin K. Trivedi,et al.  Regression Analysis of Count Data: Preface , 1998 .

[22]  J. Brownstein,et al.  Early detection of disease outbreaks using the Internet , 2009, Canadian Medical Association Journal.

[23]  Gunther Eysenbach,et al.  Infodemiology: Tracking Flu-Related Searches on the Web for Syndromic Surveillance , 2006, AMIA.

[24]  Pravin K. Trivedi,et al.  Regression Analysis of Count Data , 1998 .

[25]  Bert Veenendaal,et al.  Applying cusum-based methods for the detection of outbreaks of Ross River virus disease in Western Australia , 2008, BMC Medical Informatics Decis. Mak..