Predicting Infectious Disease Using Deep Learning and Big Data

Infectious disease occurs when a person is infected by a pathogen from another person or an animal. It is a problem that causes harm at both individual and macro scales. The Korea Center for Disease Control (KCDC) operates a surveillance system to minimize infectious disease contagions. However, in this system, it is difficult to immediately act against infectious disease because of missing and delayed reports. Moreover, infectious disease trends are not known, which means prediction is not easy. This study predicts infectious diseases by optimizing the parameters of deep learning algorithms while considering big data including social media data. The performance of the deep neural network (DNN) and long-short term memory (LSTM) learning models were compared with the autoregressive integrated moving average (ARIMA) when predicting three infectious diseases one week into the future. The results show that the DNN and LSTM models perform better than ARIMA. When predicting chickenpox, the top-10 DNN and LSTM models improved average performance by 24% and 19%, respectively. The DNN model performed stably and the LSTM model was more accurate when infectious disease was spreading. We believe that this study’s models can help eliminate reporting delays in existing surveillance systems and, therefore, minimize costs to society.

[1]  Jisun An,et al.  High correlation of Middle East respiratory syndrome spread with Google search and Twitter trends in Korea , 2016, Scientific Reports.

[2]  Andrew C. Miller,et al.  Advances in nowcasting influenza-like illness rates using search query logs , 2015, Scientific Reports.

[3]  Andrew Janowczyk,et al.  Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases , 2016, Journal of pathology informatics.

[4]  Yulei Rao,et al.  A deep learning framework for financial time series using stacked autoencoders and long-short term memory , 2017, PloS one.

[5]  L. Aharonson-Daniel,et al.  Twitter in the Cross Fire—The Use of Social Media in the Westgate Mall Terror Attack in Kenya , 2014, PloS one.

[6]  Dmitrii Bychkov,et al.  Deep learning based tissue analysis predicts outcome in colorectal cancer , 2018, Scientific Reports.

[7]  Joshua M. Epstein,et al.  Controlling Pandemic Flu: The Value of International Air Travel Restrictions , 2007, PloS one.

[8]  Yong Huang,et al.  Dynamic Forecasting of Zika Epidemics Using Google Trends , 2016, bioRxiv.

[9]  Ellyn Ayton,et al.  Forecasting influenza-like illness dynamics for military populations using neural networks and social media , 2017, PloS one.

[10]  Wolfgang Jank,et al.  Real-Time Diffusion of Information on Twitter and the Financial Markets , 2016, PloS one.

[11]  C. Furlanello,et al.  Mitigation Measures for Pandemic Influenza in Italy: An Individual Based Model Considering Different Scenarios , 2008, PloS one.

[12]  Dennis KM Ip,et al.  A profile of the online dissemination of national influenza surveillance data , 2009, BMC public health.

[13]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[14]  J. Rocklöv,et al.  Short Term Effects of Weather on Hand, Foot and Mouth Disease , 2011, PloS one.

[15]  Aravind Srinivasan,et al.  Modelling disease outbreaks in realistic urban social networks , 2004, Nature.

[16]  Alessandro Vespignani,et al.  Modeling the Worldwide Spread of Pandemic Influenza: Baseline Case and Containment Interventions , 2007, PLoS medicine.

[17]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[18]  S. Tong,et al.  A large temperature fluctuation may trigger an epidemic erythromelalgia outbreak in China , 2015, Scientific Reports.

[19]  Wenbiao Hu,et al.  A threshold analysis of dengue transmission in terms of weather variables and imported dengue cases in Australia , 2013, Emerging Microbes & Infections.

[20]  Ming-Hsiang Tsou,et al.  Applying GIS and Machine Learning Methods to Twitter Data for Multiscale Surveillance of Influenza , 2016, PloS one.

[21]  Gail M. Williams,et al.  Imported Dengue Cases, Weather Variation and Autochthonous Dengue Incidence in Cairns, Australia , 2013, PloS one.

[22]  Hunter R. Merrill,et al.  Mapping the epidemic changes and risks of hemorrhagic fever with renal syndrome in Shaanxi Province, China, 2005–2016 , 2018, Scientific Reports.

[23]  Zhiwei Xu,et al.  Monitoring Pertussis Infections Using Internet Search Queries , 2017, Scientific Reports.

[24]  Taghi M. Khoshgoftaar,et al.  Deep learning applications and challenges in big data analytics , 2015, Journal of Big Data.

[25]  Henrikki Tenkanen,et al.  Instagram, Flickr, or Twitter: Assessing the usability of social media data for visitor monitoring in protected areas , 2017, Scientific Reports.

[26]  S. Cassadou,et al.  Time series analysis of dengue incidence in Guadeloupe, French West Indies: Forecasting models using climate variables as predictors , 2011, BMC infectious diseases.

[27]  J. Beier,et al.  The impact of variations in temperature on early Plasmodium falciparum development in Anopheles stephensi , 1995, Parasitology.

[28]  Kai Wang,et al.  Forecast Model Analysis for the Morbidity of Tuberculosis in Xinjiang, China , 2015, PloS one.

[29]  Alessandro Vespignani,et al.  influenza A(H1N1): a Monte Carlo likelihood analysis based on , 2009 .

[30]  Kiyoshi Aoyagi,et al.  Construction and evaluation of two computational models for predicting the incidence of influenza in Nagasaki Prefecture, Japan , 2017, Scientific Reports.

[31]  Jin-Feng Wang,et al.  Monitoring hand, foot and mouth disease by combining search engine query data and meteorological factors. , 2018, The Science of the total environment.

[32]  D. Cummings,et al.  Strategies for mitigating an influenza pandemic , 2006, Nature.

[33]  Jae Ho Lee,et al.  Correlation between National Influenza Surveillance Data and Google Trends in South Korea , 2013, PloS one.

[34]  Bin Wang,et al.  Time Series Analyses of Hand, Foot and Mouth Disease Integrating Weather Variables , 2015, PloS one.

[35]  Alessandro Vespignani,et al.  Multiscale mobility networks and the spatial spreading of infectious diseases , 2009, Proceedings of the National Academy of Sciences.

[36]  Kwok-Leung Tsui,et al.  Forecasting influenza in Hong Kong with Google search queries and statistical model fusion , 2017, PloS one.

[37]  Chimyung Kwon,et al.  Monitoring Seasonal Influenza Epidemics in Korea through Query Search , 2014 .

[38]  Christopher M. Danforth,et al.  Forecasting the onset and course of mental illness with Twitter data , 2016, Scientific Reports.

[39]  Shilu Tong,et al.  Disease surveillance based on Internet-based linear models: an Australian case study of previously unmodeled infection diseases , 2016, Scientific reports.

[40]  Chris Hankin,et al.  DEFENDER: Detecting and Forecasting Epidemics Using Novel Data-Analytics for Enhanced Response , 2015, PloS one.

[41]  Qi Li,et al.  Application of an autoregressive integrated moving average model for predicting the incidence of hemorrhagic fever with renal syndrome. , 2012, The American journal of tropical medicine and hygiene.

[42]  Carlos Castillo-Chavez,et al.  Mass Media and the Contagion of Fear: The Case of Ebola in America , 2015, PloS one.

[43]  K. Paaijmans,et al.  Implications of temperature variation for malaria parasite development across Africa , 2013, Scientific Reports.

[44]  Y. Gel,et al.  Influenza Forecasting with Google Flu Trends , 2013, PloS one.

[45]  Ben Armstrong,et al.  Host, Weather and Virological Factors Drive Norovirus Epidemiology: Time-Series Analysis of Laboratory Surveillance Data in England and Wales , 2009, PloS one.

[46]  Feng Xia,et al.  Bibliographic Analysis of Nature Based on Twitter and Facebook Altmetrics Data , 2016, PloS one.

[47]  Goran Nenadic,et al.  Frequent discussion of insomnia and weight gain with glucocorticoid therapy: an analysis of Twitter posts , 2017, npj Digital Medicine.