Learning from pandemics: using extraordinary events can improve disease now-casting models

Online searches have been used to study different health-related behaviours, including monitoring disease outbreaks. An obvious caveat is that several reasons can motivate individuals to seek online information and models that are blind to people's motivations are of limited use and can even mislead. This is particularly true during extraordinary public health crisis, such as the ongoing pandemic, when fear, curiosity and many other reasons can lead individuals to search for health-related information, masking the disease-driven searches. However, health crisis can also offer an opportunity to disentangle between different drivers and learn about human behavior. Here, we focus on the two pandemics of the 21st century (2009-H1N1 flu and Covid-19) and propose a methodology to discriminate between search patterns linked to general information seeking (media driven) and search patterns possibly more associated with actual infection (disease driven). We show that by learning from such pandemic periods, with high anxiety and media hype, it is possible to select online searches and improve model performance both in pandemic and seasonal settings. Moreover, and despite the common claim that more data is always better, our results indicate that lower volume of the right data can be better than including large volumes of apparently similar data, especially in the long run. Our work provides a general framework that can be applied beyond specific events and diseases, and argues that algorithms can be improved simply by using less (better) data. This has important consequences, for example, to solve the accuracy-explainability trade-off in machine-learning.

[1]  Alessandro Vespignani,et al.  An Early Warning Approach to Monitor COVID-19 Activity with Multiple Digital Traces in Near Real-Time. , 2020, ArXiv.

[2]  Carlos Castillo-Chavez,et al.  Mass Media and the Contagion of Fear: The Case of Ebola in America , 2015, PloS one.

[3]  Andrew Rambaut,et al.  Origins of the 2009 H1N1 influenza pandemic in swine in Mexico , 2016, eLife.

[4]  Robert L Cook,et al.  Evaluating Google, Twitter, and Wikipedia as Tools for Influenza Surveillance Using Bayesian Change Point Analysis: A Comparative Analysis , 2016, JMIR public health and surveillance.

[5]  Manuel Marques-Pita,et al.  Early and Real-Time Detection of Seasonal Influenza Onset , 2017, PLoS Comput. Biol..

[6]  Torsten Schmidt,et al.  Forecasting Private Consumption: Survey-Based Indicators vs. Google Trends , 2009 .

[7]  P. Horby,et al.  Estimated global mortality associated with the first 12 months of 2009 pandemic influenza A H1N1 virus circulation: a modelling study. , 2012, The Lancet. Infectious diseases.

[8]  S. Günther,et al.  Detection of influenza A(H1N1)v virus by real-time RT-PCR. , 2009, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[9]  Lutfan Lazuardi,et al.  Correlation between Google Trends on dengue fever and national surveillance report in Indonesia , 2019, Global health action.

[10]  G. Rutherford,et al.  Will Coronavirus Disease 2019 Become Seasonal? , 2020, The Journal of infectious diseases.

[11]  J. Weitz,et al.  Real-time, interactive website for US-county level Covid-19 event risk assessment , 2020, medRxiv.

[12]  Sharareh R Niakan Kalhori,et al.  Predicting COVID-19 Incidence Through Analysis of Google Trends Data in Iran: Data Mining and Deep Learning Pilot Study , 2020, JMIR Public Health and Surveillance.

[13]  Mark Dredze,et al.  Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance , 2015, PLoS Comput. Biol..

[14]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[15]  Jane M Heffernan,et al.  Modelling the effects of media during an influenza epidemic , 2014, BMC Public Health.

[16]  Tilo Hartmann,et al.  Swine flu and hype: a systematic review of media dramatization of the H1N1 influenza pandemic , 2016 .

[17]  D. Brossard,et al.  Media Coverage of Public Health Epidemics: Linking Framing and Issue Attention Cycle Toward an Integrated Theory of Print News Coverage of Epidemics , 2008 .

[18]  Dianbo Liu,et al.  A machine learning methodology for real-time forecasting of the 2019-2020 COVID-19 outbreak using Internet searches, news alerts, and estimates from mechanistic models , 2020, ArXiv.

[19]  Mark Dredze,et al.  Separating Fact from Fear: Tracking Flu Infections on Twitter , 2013, NAACL.

[20]  Paul Perco,et al.  Association of the COVID-19 pandemic with Internet Search Volumes: A Google TrendsTM Analysis , 2020, International Journal of Infectious Diseases.

[21]  H. Varian,et al.  Predicting the Present with Google Trends , 2012 .

[22]  Dylan B. George,et al.  Big Data Opportunities for Global Infectious Disease Surveillance , 2013, PLoS medicine.

[23]  Eleftherios Mylonakis,et al.  Google trends: a web-based tool for real-time surveillance of disease outbreaks. , 2009, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[24]  Matthew Mohebbi,et al.  Assessing Google Flu Trends Performance in the United States during the 2009 Influenza Virus A (H1N1) Pandemic , 2011, PloS one.

[25]  T. Greenhalgh,et al.  Management of post-acute covid-19 in primary care , 2020, BMJ.

[26]  C. del Rio,et al.  Long-term Health Consequences of COVID-19. , 2020, JAMA.

[27]  Cécile Viboud,et al.  Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic Influenza: A Comparative Epidemiological Study at Three Geographic Scales , 2013, PLoS Comput. Biol..

[28]  L. Brammer,et al.  Surveillance for influenza during the 2009 influenza A (H1N1) pandemic-United States, April 2009-March 2010. , 2011, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[29]  J. Weitz,et al.  Real-time, interactive website for US-county-level COVID-19 event risk assessment , 2020, Nature Human Behaviour.

[30]  Emily H. Chan,et al.  Using Web Search Query Data to Monitor Dengue Epidemics: A New Model for Neglected Tropical Disease Surveillance , 2011, PLoS neglected tropical diseases.

[31]  Our world in data , 2022 .

[32]  E. Das,et al.  “Pandemic Public Health Paradox”: Time Series Analysis of the 2009/10 Influenza A / H1N1 Epidemiology, Media Attention, Risk Perception and Public Reactions in 5 European Countries , 2016, PloS one.

[33]  D. Cummings,et al.  Prediction of Dengue Incidence Using Search Query Surveillance , 2011, PLoS neglected tropical diseases.

[34]  Lucie Abeler-Dörner,et al.  Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing , 2020, Science.

[35]  C. Bauch,et al.  Nine challenges in incorporating the dynamics of behaviour in infectious diseases models. , 2015, Epidemics.

[36]  Jeffrey Shaman,et al.  Absolute humidity modulates influenza survival, transmission, and seasonality , 2009, Proceedings of the National Academy of Sciences.

[37]  Kate Faasse,et al.  Public Anxiety and Information Seeking Following the H1N1 Outbreak: Blogs, Newspaper Articles, and Wikipedia Visits , 2012, Health communication.

[38]  S. Bhatt,et al.  Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe , 2020, Nature.

[39]  Caroline O. Buckee,et al.  Digital Epidemiology , 2012, PLoS Comput. Biol..

[40]  Jane M. Heffernan,et al.  The Effects of Media Reports on Disease Spread and Important Public Health Measurements , 2015, PloS one.

[41]  B. Duncan,et al.  How the media reported the first days of the pandemic (H1N1) 2009: results of EU-wide media analysis. , 2009, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[42]  Nick Chater,et al.  Using big data to predict collective behavior in the real world 1 , 2014, Behavioral and Brain Sciences.

[43]  Gerjo Kok,et al.  Disease Detection or Public Opinion Reflection? Content Analysis of Tweets, Other Social Media, and Online Newspapers During the Measles Outbreak in the Netherlands in 2013 , 2015, Journal of medical Internet research.

[44]  B. Reis,et al.  Internet Search Patterns Reveal Clinical Course of Disease Progression for COVID-19 and Predict Pandemic Spread in 32 Countries , 2020, medRxiv.

[45]  Madeleine K. D. Scott,et al.  Systems biological assessment of immunity to mild versus severe COVID-19 infection in humans , 2020, Science.

[46]  J. Dillard,et al.  Fear of Zika: Information Seeking as Cause and Consequence , 2020, Health communication.

[47]  Colin J. Carlson,et al.  Misconceptions about weather and seasonality must not misguide COVID-19 response , 2020, Nature Communications.

[48]  Thayer Alshaabi,et al.  Divergent modes of online collective attention to the COVID-19 pandemic are associated with future caseload variance , 2020, ArXiv.

[49]  H. Varian,et al.  Predicting the Present with Google Trends , 2009 .

[50]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[51]  P. Zucs,et al.  Initial surveillance of 2009 influenza A(H1N1) pandemic in the European Union and European Economic Area, April-September 2009. , 2010, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[52]  Yong Huang,et al.  Dynamic Forecasting of Zika Epidemics Using Google Trends , 2016, bioRxiv.

[53]  J. Dillard,et al.  Understanding Fear of Zika: Personal, Interpersonal, and Media Influences , 2018, Risk analysis : an official publication of the Society for Risk Analysis.

[54]  Christian Stefansen,et al.  GOOGLE DISEASE TRENDS: AN UPDATE , 2013 .

[55]  Daniela Paolotti,et al.  The impact of news exposure on collective attention in the United States during the 2016 Zika epidemic , 2020, PLoS computational biology.

[56]  James M. Hyman,et al.  Forecasting the 2013–2014 Influenza Season Using Wikipedia , 2014, PLoS Comput. Biol..

[57]  Daniela Paolotti,et al.  The impact of news exposure on collective attention in the United States during the 2016 Zika epidemic , 2018, bioRxiv.

[58]  S. Stephens-Davidowitz The cost of racial animus on a black candidate: Evidence using Google search data☆ , 2014 .