Monitoring seasonal influenza epidemics by using internet search data with an ensemble penalized regression model

Seasonal influenza epidemics cause serious public health problems in China. Search queries-based surveillance was recently proposed to complement traditional monitoring approaches of influenza epidemics. However, developing robust techniques of search query selection and enhancing predictability for influenza epidemics remains a challenge. This study aimed to develop a novel ensemble framework to improve penalized regression models for detecting influenza epidemics by using Baidu search engine query data from China. The ensemble framework applied a combination of bootstrap aggregating (bagging) and rank aggregation method to optimize penalized regression models. Different algorithms including lasso, ridge, elastic net and the algorithms in the proposed ensemble framework were compared by using Baidu search engine queries. Most of the selected search terms captured the peaks and troughs of the time series curves of influenza cases. The predictability of the conventional penalized regression models were improved by the proposed ensemble framework. The elastic net regression model outperformed the compared models, with the minimum prediction errors. We established a Baidu search engine queries-based surveillance model for monitoring influenza epidemics, and the proposed model provides a useful tool to support the public health response to influenza and other infectious diseases.

[1]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[2]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[3]  Spyros Makridakis,et al.  Accuracy measures: theoretical and practical concerns☆ , 1993 .

[4]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[5]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[8]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[9]  Gunther Eysenbach,et al.  Infodemiology: Tracking Flu-Related Searches on the Web for Syndromic Surveillance , 2006, AMIA.

[10]  Vasyl Pihur,et al.  Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach , 2007, Bioinform..

[11]  Saso Dzeroski,et al.  Combining Bagging and Random Subspaces to Create Better Ensembles , 2007, IDA.

[12]  Vasyl Pihur,et al.  RankAggreg, an R package for weighted rank aggregation , 2009, BMC Bioinformatics.

[13]  Kenneth D. Mandl,et al.  HealthMap: Global Infectious Disease Monitoring through Automated Classification and Visualization of Internet Media Reports , 2008, Journal of the American Medical Informatics Association.

[14]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[15]  Matthew P. Frosch,et al.  Case records of the Massachusetts General Hospital. Case 12-2009. A 46-year-old man with migraine, aphasia, and hemiparesis and similarly affected family members. , 2009, The New England journal of medicine.

[16]  J. Brownstein,et al.  Digital disease detection--harnessing the Web for public health surveillance. , 2009, The New England journal of medicine.

[17]  Nitesh V. Chawla,et al.  Generating Diverse Ensembles to Counter the Problem of Class Imbalance , 2010, PAKDD.

[18]  G. Eysenbach,et al.  Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak , 2010, PloS one.

[19]  M. Vicente,et al.  Monitoring influenza activity in Europe with Google Flu Trends: comparison with the findings of sentinel physician networks - results for 2009-10. , 2010, Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin.

[20]  Vasyl Pihur,et al.  An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data , 2010, BMC Bioinformatics.

[21]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[22]  D. Cummings,et al.  Prediction of Dengue Incidence Using Search Query Surveillance , 2011, PLoS neglected tropical diseases.

[23]  Chai Cheng-lian Surveillance of influenza in Zhejiang,2008-2012 , 2012 .

[24]  Xiangyi Liu,et al.  Effectiveness of seasonal influenza vaccine against clinically diagnosed influenza over 2 consecutive seasons in children in Guangzhou, China , 2013, Human vaccines & immunotherapeutics.

[25]  Hong Zhou,et al.  Regional variation in mortality impact of the 2009 A(H1N1) influenza pandemic in China , 2013, Influenza and other respiratory viruses.

[26]  J. Brownstein,et al.  Influenza A (H7N9) and the importance of digital epidemiology. , 2013, The New England journal of medicine.

[27]  S. Rutherford,et al.  Using Google Trends for Influenza Surveillance in South China , 2013, PloS one.

[28]  E. Nsoesie,et al.  Monitoring Influenza Epidemics in China with Search Query from Baidu , 2013, PloS one.

[29]  Cécile Viboud,et al.  Reassessing Google Flu Trends Data for Detection of Seasonal and Pandemic Influenza: A Comparative Epidemiological Study at Three Geographic Scales , 2013, PLoS Comput. Biol..

[30]  Xijin Xu,et al.  Blood Lead Levels and Associated Factors among Children in Guiyu of China: A Population-Based Study , 2014, PloS one.

[31]  Guoqing Wang,et al.  Gene expression profile based classification models of psoriasis. , 2014, Genomics.

[32]  S. Roberts,et al.  Stabilizing the lasso against cross-validation variability , 2014, Comput. Stat. Data Anal..

[33]  Y. Hao,et al.  Improved Variable Selection Algorithm Using a LASSO-Type Penalty, with an Application to Assessing Hepatitis B Infection Relevant Factors in Community Residents , 2015, PLoS ONE.

[34]  Tao Liu,et al.  Early detection of an epidemic erythromelalgia outbreak using Baidu search data , 2015, Scientific Reports.

[35]  Yi Guan,et al.  Dissemination, divergence and establishment of H7N9 influenza viruses in China , 2015, Nature.