Predicting social response to infectious disease outbreaks from internet-based news streams

Infectious disease outbreaks often have consequences beyond human health, including concern among the population, economic instability, and sometimes violence. A warning system capable of anticipating social disruptions resulting from disease outbreaks is urgently needed to help decision makers prepare appropriately. We designed a system that operates in near real-time to identify and predict social response. Over 150,000 Internet-based news articles related to outbreaks of 16 diseases in 72 countries and territories were provided by HealthMap. These articles were automatically tagged with indicators of the disease activity and population reaction. An anomaly detection algorithm was implemented on the population reaction indicators to identify periods of unusually severe social response. Then a model was developed to predict the probability of these periods of unusually severe social response occurring in the coming week, 2 and 3 weeks. This model exhibited remarkably strong performance for diseases with substantial media coverage. For country-disease pairs with a median of 20 or more articles per year, the onset of social response in the next week was correctly predicted over 60% of the time, and 87% of weeks were correctly predicted. Performance was weaker for diseases with little media coverage, and, for these diseases, the main utility of our system is in identifying social response when it occurs, rather than predicting when it will happen in the future. Overall, the developed near real-time prediction approach is a promising step toward developing predictive models to inform responders of the likely social consequences of disease spread.

[1]  Douglas C. Montgomery,et al.  Introduction to Statistical Quality Control , 1986 .

[2]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[3]  Leslie D. Servi Analyzing social media data having discontinuous underlying dynamics , 2013, Oper. Res. Lett..

[4]  Nathaniel Beck,et al.  Taking Time Seriously: Time-Series-Cross-Section Analysis with a Binary Dependent Variable , 1998 .

[5]  Andrew Trotman,et al.  Sound and complete relevance assessment for XML retrieval , 2008, TOIS.

[6]  J. Woodall,et al.  Global surveillance of emerging diseases: the ProMED-mail perspective. , 2001, Cadernos de saude publica.

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  Marta C. González,et al.  Modelling the propagation of social response during a disease outbreak , 2015, Journal of The Royal Society Interface.

[9]  Sharyn O'Halloran,et al.  Alternative Models of Dynamics in Binary Time-Series-Cross-Section Models: The Example of State Failure 1 , 2001 .

[10]  Andrew W. Moore,et al.  Algorithms for rapid outbreak detection: a research synthesis , 2005, J. Biomed. Informatics.

[11]  Kevin B. Korb,et al.  Anomaly detection in vessel tracks using Bayesian networks , 2014, Int. J. Approx. Reason..

[12]  Vito D'Orazio,et al.  Kickoff to Conflict: A Sequence Analysis of Intra-State Conflict-Preceding Event Structures , 2015, PloS one.

[13]  J. Kinsman “A time of fear”: local, national, and international responses to a large Ebola outbreak in Uganda , 2012, Globalization and Health.

[14]  S. Jackman In and Out of War and Peace: Transitional Models of International Conflict , 2000 .

[15]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[16]  Mark P. Racette,et al.  Improving situational awareness for humanitarian logistics through predictive modeling , 2014, 2014 Systems and Information Engineering Design Symposium (SIEDS).

[17]  Eric Mykhalovskiy,et al.  The Global Public Health Intelligence Network and early warning outbreak detection: a Canadian contribution to global public health. , 2006, Canadian journal of public health = Revue canadienne de sante publique.

[18]  Daniel Gayo-Avello,et al.  A Meta-Analysis of State-of-the-Art Electoral Prediction From Twitter Data , 2012, ArXiv.

[19]  S. W. Roberts Control chart tests based on geometric moving averages , 2000 .

[20]  K. Choi,et al.  Avoidance behaviors and negative psychological responses in the general population in the initial stage of the H1N1 pandemic in Hong Kong , 2010, BMC infectious diseases.

[21]  Hsinchun Chen,et al.  Textual analysis of stock market prediction using breaking financial news: The AZFin text system , 2009, TOIS.

[22]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[23]  Chang-Tien Lu,et al.  Forecasting Significant Societal Events Using The Embers Streaming Predictive Analytics System , 2014, Big Data.

[24]  Matthew S. Gerber,et al.  Predicting crime using Twitter and kernel density estimation , 2014, Decis. Support Syst..

[25]  J. Raude,et al.  Why the French did not choose to panic: a dynamic analysis of the public response to the influenza pandemic. , 2013, Sociology of health & illness.

[26]  J. McGrath,et al.  Biological impact of social disruption resulting from epidemic disease. , 1991, American journal of physical anthropology.

[27]  Bernadette A. Thomas,et al.  Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study 2010 , 2012, The Lancet.

[28]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[29]  Michael D. Ward,et al.  Improving Predictions using Ensemble Bayesian Model Averaging , 2012, Political Analysis.

[30]  J. Brownstein,et al.  Early detection of disease outbreaks using the Internet , 2009, Canadian Medical Association Journal.

[31]  Andrew W. Moore,et al.  Bayesian Network Anomaly Pattern Detection for Disease Outbreaks , 2003, ICML.

[32]  Son Doan,et al.  BioCaster: detecting public health rumors with a Web-based text mining system , 2008, Bioinform..

[33]  Sean P. O'Brien,et al.  Crisis Early Warning and Decision Support: Contemporary Approaches and Thoughts on Future Research , 2010 .

[34]  Ben Y. Reis,et al.  Surveillance Sans Frontières: Internet-Based Emerging Infectious Disease Intelligence and the HealthMap Project , 2008, PLoS medicine.

[35]  Ali Hamzeh,et al.  Anomaly Detection in Categorical Datasets Using Bayesian Networks , 2011, AICI.

[36]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.