An ensemble forecast model of dengue in Guangzhou, China using climate and social media surveillance data.

BACKGROUND China experienced an unprecedented outbreak of dengue in 2014, and the number of dengue cases reached the highest level over the past 25 years. There is a significant delay in the release of official case count data, and our ability to timely track the timing and magnitude of local outbreaks of dengue remains limited. MATERIAL AND METHODS We developed an ensemble penalized regression algorithm (EPRA) for initializing near-real time forecasts of the dengue epidemic trajectory by integrating different penalties (LASSO, Ridge, Elastic Net, SCAD and MCP) with the techniques of iteratively sampling and model averaging. Multiple streams of near-real time data including dengue-related Baidu searches, Sina Weibo posts, and climatic conditions with historical dengue incidence were used. We compared the predictive power of the EPRA with the alternates, penalized regression models using single penalties, to retrospectively forecast weekly dengue incidence and detect outbreak occurrence defined using different cutoffs, during the periods of 2011-2016 in Guangzhou, south China. RESULTS The EPRA showed the best or at least comparable performance for 1-, 2-week ahead out-of-sample and leave-one-out cross validation forecasts. The findings indicate that skillful near-real time forecasts of dengue and confidence in those predictions can be made. For detecting dengue outbreaks, the EPRA predicted periods of high incidence of dengue more accurately than the alternates. CONCLUSION This study developed a statistically rigorous approach for near-real time forecast of dengue in China. The EPRA provides skillful forecasts and can be used as timely and complementary ways to assess dengue dynamics, which will help to design interventions to mitigate dengue transmission.

[1]  J. Brownstein,et al.  Digital disease detection--harnessing the Web for public health surveillance. , 2009, The New England journal of medicine.

[2]  Vasyl Pihur,et al.  An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data , 2010, BMC Bioinformatics.

[3]  Wenbiao Hu,et al.  Evaluation of the Performance of a Dengue Outbreak Detection Tool for China , 2014, PloS one.

[4]  John S. Brownstein,et al.  The global distribution and burden of dengue , 2013, Nature.

[5]  Ye Wen,et al.  Monitoring seasonal influenza epidemics by using internet search data with an ensemble penalized regression model , 2017, Scientific Reports.

[6]  Wenjun Ma,et al.  Dengue Baidu Search Index data can improve the prediction of local dengue epidemic: A case study in Guangzhou, China , 2017, PLoS neglected tropical diseases.

[7]  G. Eysenbach,et al.  Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak , 2010, PloS one.

[8]  J. Shaman,et al.  Forecasting seasonal outbreaks of influenza , 2012, Proceedings of the National Academy of Sciences.

[9]  Vasyl Pihur,et al.  Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach , 2007, Bioinform..

[10]  Kung-Sik Chan,et al.  Climate variation drives dengue dynamics , 2016, Proceedings of the National Academy of Sciences.

[11]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[12]  Xinyue Ye,et al.  Use of Social Media for the Detection and Analysis of Infectious Diseases in China , 2016, ISPRS Int. J. Geo Inf..

[13]  Xijin Xu,et al.  Blood Lead Levels and Associated Factors among Children in Guiyu of China: A Population-Based Study , 2014, PloS one.

[14]  Yang Yang,et al.  Using Baidu Search Index to Predict Dengue Outbreak in China , 2016, Scientific Reports.

[15]  S. Rutherford,et al.  Using Google Trends for Influenza Surveillance in South China , 2013, PloS one.

[16]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[17]  Rob J Hyndman,et al.  Another look at measures of forecast accuracy , 2006 .

[18]  Axel Benner,et al.  Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data , 2011, BMC Bioinformatics.

[19]  Andrew J Tatem,et al.  The changing epidemiology of dengue in China, 1990-2014: a descriptive analysis of 25 years of nationwide surveillance data , 2015, BMC Medicine.

[20]  Kenneth D. Mandl,et al.  HealthMap: Global Infectious Disease Monitoring through Automated Classification and Visualization of Internet Media Reports , 2008, Journal of the American Medical Informatics Association.

[21]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[22]  Qiyong Liu,et al.  Dengue fever in China , 2015, The Lancet.

[23]  D. Cummings,et al.  Prediction of Dengue Incidence Using Search Query Surveillance , 2011, PLoS neglected tropical diseases.

[24]  Tao Liu,et al.  Developing a dengue forecast model using machine learning: A case study in China , 2017, PLoS neglected tropical diseases.

[25]  Sasikiran Kandula,et al.  Superensemble forecasts of dengue outbreaks , 2016, Journal of The Royal Society Interface.

[26]  Cécile Viboud,et al.  Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic Influenza: A Comparative Epidemiological Study at Three Geographic Scales , 2013, PLoS Comput. Biol..

[27]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[28]  Lei Luo,et al.  Emergence of dengue virus 4 genotype II in Guangzhou, China, 2010: Survey and molecular epidemiology of one community outbreak , 2012, BMC Infectious Diseases.

[29]  Nicola Torelli,et al.  Training and assessing classification rules with imbalanced data , 2012, Data Mining and Knowledge Discovery.

[30]  John S. Brownstein,et al.  Forecasting Zika Incidence in the 2016 Latin America Outbreak Combining Traditional Disease Surveillance with Search, Social Media, and News Report Data , 2017, PLoS neglected tropical diseases.

[31]  Xing Li,et al.  Characterizing a large outbreak of dengue fever in Guangdong Province, China , 2016, Infectious Diseases of Poverty.

[32]  Y. Hao,et al.  Improved Variable Selection Algorithm Using a LASSO-Type Penalty, with an Application to Assessing Hepatitis B Infection Relevant Factors in Community Residents , 2015, PLoS ONE.

[33]  Gunther Eysenbach,et al.  Infodemiology: Tracking Flu-Related Searches on the Web for Syndromic Surveillance , 2006, AMIA.

[34]  E. Nsoesie,et al.  Monitoring Influenza Epidemics in China with Search Query from Baidu , 2013, PloS one.