Forecasting influenza-like illness dynamics for military populations using neural networks and social media

This work is the first to take advantage of recurrent neural networks to predict influenza-like illness (ILI) dynamics from various linguistic signals extracted from social media data. Unlike other approaches that rely on timeseries analysis of historical ILI data and the state-of-the-art machine learning models, we build and evaluate the predictive power of neural network architectures based on Long Short Term Memory (LSTMs) units capable of nowcasting (predicting in “real-time”) and forecasting (predicting the future) ILI dynamics in the 2011 – 2014 influenza seasons. To build our models we integrate information people post in social media e.g., topics, embeddings, word ngrams, stylistic patterns, and communication behavior using hashtags and mentions. We then quantitatively evaluate the predictive power of different social media signals and contrast the performance of the-state-of-the-art regression models with neural networks using a diverse set of evaluation metrics. Finally, we combine ILI and social media signals to build a joint neural network model for ILI dynamics prediction. Unlike the majority of the existing work, we specifically focus on developing models for local rather than national ILI surveillance, specifically for military rather than general populations in 26 U.S. and six international locations., and analyze how model performance depends on the amount of social media data available per location. Our approach demonstrates several advantages: (a) Neural network architectures that rely on LSTM units trained on social media data yield the best performance compared to previously used regression models. (b) Previously under-explored language and communication behavior features are more predictive of ILI dynamics than stylistic and topic signals expressed in social media. (c) Neural network models learned exclusively from social media signals yield comparable or better performance to the models learned from ILI historical data, thus, signals from social media can be potentially used to accurately forecast ILI dynamics for the regions where ILI historical data is not available. (d) Neural network models learned from combined ILI and social media signals significantly outperform models that rely solely on ILI historical data, which adds to a great potential of alternative public sources for ILI dynamics prediction. (e) Location-specific models outperform previously used location-independent models e.g., U.S. only. (f) Prediction results significantly vary across geolocations depending on the amount of social media data available and ILI activity patterns. (g) Model performance improves with more tweets available per geo-location e.g., the error gets lower and the Pearson score gets higher for locations with more tweets.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Hinrich Schütze,et al.  Introduction to Information Retrieval: Scoring, term weighting, and the vector space model , 2008 .

[3]  Alina Deshpande,et al.  Global Disease Monitoring and Forecasting with Wikipedia , 2014, PLoS Comput. Biol..

[4]  Hinrich Schütze,et al.  Scoring , term weighting and thevector space model , 2015 .

[5]  K. Denecke,et al.  Social Media and Internet-Based Data in Global Systems for Public Health Surveillance: A Systematic Review , 2014, The Milbank quarterly.

[6]  Michael J. Paul,et al.  Session Introduction , 2016, PSB.

[7]  Ernesto Diaz-Aviles,et al.  Tracking Twitter for epidemic intelligence: case study: EHEC/HUS outbreak in Germany, 2011 , 2012, WebSci '12.

[8]  S. Volkova,et al.  Account Deletion Prediction on RuNet: A Case Study of Suspicious Twitter Accounts Active During the Russian-Ukrainian Crisis , 2016 .

[9]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[10]  F. Mostashari,et al.  Monitoring over-the-counter medication sales for early detection of disease outbreaks--New York City. , 2005, MMWR supplements.

[11]  E. Nsoesie,et al.  Guess Who’s Not Coming to Dinner? Evaluating Online Restaurant Reservations for Disease Surveillance , 2014, Journal of medical Internet research.

[12]  Mark Dredze,et al.  Separating Fact from Fear: Tracking Flu Infections on Twitter , 2013, NAACL.

[13]  John S. Brownstein,et al.  Wikipedia Usage Estimates Prevalence of Influenza-Like Illness in the United States in Near Real-Time , 2014, PLoS Comput. Biol..

[14]  Mark Dredze,et al.  Measuring Post Traumatic Stress Disorder in Twitter , 2014, ICWSM.

[15]  Laurent Hébert-Dufresne,et al.  Enhancing disease surveillance with novel data streams: challenges and opportunities , 2015, EPJ Data Science.

[16]  Michael J. Paul,et al.  Twitter Improves Influenza Forecasting , 2014, PLoS currents.

[17]  Farzad Mostashari,et al.  Monitoring the Impact of Influenza by Age: Emergency Department Fever and Respiratory Complaint Surveillance in New York City , 2007, PLoS medicine.

[18]  Kenneth D. Mandl,et al.  Reengineering Real Time Outbreak Detection Systems for Influenza Epidemic Monitoring , 2006, AMIA.

[19]  Svitlana Volkova,et al.  Uncovering the relationships between military community health and affects expressed in social media , 2017, EPJ Data Science.

[20]  J. Brownstein,et al.  Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak. , 2012, The American journal of tropical medicine and hygiene.

[21]  Kenneth D. Mandl,et al.  HealthMap: Global Infectious Disease Monitoring through Automated Classification and Visualization of Internet Media Reports , 2008, Journal of the American Medical Informatics Association.

[22]  Dylan B. George,et al.  Multiple Estimates of Transmissibility for the 2009 Influenza Pandemic Based on Influenza-like-Illness Data from Small US Military Populations , 2013, PLoS Comput. Biol..

[23]  Michael J. Paul,et al.  National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic , 2013, PloS one.

[24]  Mark Dredze,et al.  Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance , 2015, PLoS Comput. Biol..

[25]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[26]  Michelle Odlum How Twitter Can Support Early Warning Systems in Ebola Outbreak Surveillance , 2015 .

[27]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[28]  Armin R. Mikler,et al.  Text and Structural Data Mining of Influenza Mentions in Web and Social Media , 2010, International journal of environmental research and public health.

[29]  Gunther Eysenbach,et al.  Infodemiology: Tracking Flu-Related Searches on the Web for Syndromic Surveillance , 2006, AMIA.

[30]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[31]  Antonio Lima,et al.  Personalized routing for multitudes in smart cities , 2015, EPJ Data Science.

[32]  Heinz Feldmann,et al.  Ebola--a growing threat? , 2014, The New England journal of medicine.

[33]  M. Shigematsu,et al.  Using Social Media for Actionable Disease Surveillance and Outbreak Management: A Systematic Literature Review , 2015, PloS one.