A Windowed Correlation-Based Feature Selection Method to Improve Time Series Prediction of Dengue Fever Cases

The performance of data-driven models depends on training samples. For accurately predicting dengue fever cases, historical incidence data are inadequate in many locations. This work aims to enhance temporally limited dengue case data by methodological addition of epidemically relevant case data from nearby locations as predictors (features). A novel framework is presented for windowing incidence data and computing time-shifted correlation-based metrics to quantify feature relevance. The framework ranks incidence data of adjacent locations around a target by combining metrics based on correlation, spatial distance, and local prevalence. Recurrent neural network models achieve up to 33.6% accuracy improvement on average using the proposed method. These models achieve mean absolute error (MAE) values as low as 0.128 on [0, 1] normalized incidence data for a municipality with the highest dengue prevalence in Brazil’s Espirito Santo. When predicting aggregate cases over geographical ecoregions, the models improve by 16.5%, using only 6.5% of ranked incidence data. This paper also presents two correlation window allocation methods: fixed-size and outbreak detection. Both perform comparably well, although the outbreak detection method uses less data for computations. The proposed framework is generalized, and it can be used to improve time-series predictions of many spatiotemporal datasets.

[1]  Cross-Correlation , 2020, Definitions.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[4]  Francesca Dominici,et al.  Local and Global Effects of Climate on Dengue Transmission in Puerto Rico , 2009, PLoS neglected tropical diseases.

[5]  Alan L Rothman,et al.  OF THE STATE OF CALIFORNIA , 2002 .

[6]  Lutz Prechelt,et al.  Early Stopping - But When? , 2012, Neural Networks: Tricks of the Trade.

[7]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[8]  W. Rowley,et al.  The effect of temperature and relative humidity on the flight performance of female Aedes aegypti. , 1968, Journal of insect physiology.

[9]  J. Yosinski,et al.  Time-series Extreme Event Forecasting with Neural Networks at Uber , 2017 .

[10]  T. Scott,et al.  Blood-feeding patterns of Aedes aegypti (Diptera: Culicidae) collected in a rural Thai village. , 1993, Journal of medical entomology.

[11]  Keun Ho Ryu,et al.  An End-to-End Adaptive Input Selection With Dynamic Weights for Forecasting Multivariate Time Series , 2019, IEEE Access.

[12]  Fernando Jiménez,et al.  Feature selection based multivariate time series forecasting: An application to antibiotic resistance outbreaks prediction , 2020, Artif. Intell. Medicine.

[13]  Nikolay Laptev,et al.  Deep and Confident Prediction for Time Series at Uber , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[14]  Vera Lucia Punzi Barcelos Capone,et al.  Análise e indexação da paisagem: o Arquivo Fotográfico Ilustrativo dos Trabalhos Geográficos de Campo do Instituto Brasileiro de Geografia e Estatística , 2016 .

[15]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[16]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[17]  Mark Q. Benedict,et al.  Temperature, Larval Diet, and Density Effects on Development Rate and Survival of Aedes aegypti (Diptera: Culicidae) , 2014, PloS one.

[18]  Mohammad Hossein Khosravi,et al.  Global, regional, and national age-sex-specific mortality for 282 causes of death in 195 countries and territories, 1980–2017: a systematic analysis for the Global Burden of Disease Study 2017 , 2018, Lancet.

[19]  J. Omernik,et al.  Perspectives on the Nature and Definition of Ecological Regions , 2004, Environmental management.

[20]  Antonio F. Gómez-Skarmeta,et al.  Towards Energy Efficiency Smart Buildings Models Based on Intelligent Data Analytics , 2016, ANT/SEIT.

[21]  Lotfi Lakhal,et al.  A Causality Based Feature Selection Approach for Multivariate Time Series Forecasting , 2017, DBKDA 2017.

[22]  N. Stenseth,et al.  Climate-driven variation in mosquito density predicts the spatiotemporal dynamics of dengue , 2019, Proceedings of the National Academy of Sciences.

[23]  Gerhard Nahler,et al.  Pearson Correlation Coefficient , 2020, Definitions.

[24]  Xianfu Chen,et al.  Deep Learning with Long Short-Term Memory for Time Series Prediction , 2018, IEEE Communications Magazine.

[25]  C.W. Anderson,et al.  Comparison of linear, nonlinear, and feature selection methods for EEG signal classification , 2003, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[26]  Jinn-Guey Lay,et al.  Higher temperature and urbanization affect the spatial patterns of dengue fever transmission in subtropical Taiwan. , 2009, The Science of the total environment.

[27]  Yunfeng Kong,et al.  A method for hand-foot-mouth disease prediction using GeoDetector and LSTM model in Guangxi, China , 2019, Scientific Reports.

[28]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[29]  Nikolaos Kourentzes,et al.  Feature selection for time series prediction - A combined filter and wrapper approach for neural networks , 2010, Neurocomputing.

[30]  Indrajit Ghosh,et al.  Forecasting dengue epidemics using a hybrid methodology , 2018, bioRxiv.

[31]  LarrañagaPedro,et al.  A review of feature selection techniques in bioinformatics , 2007 .

[32]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[33]  H. Delatte,et al.  Influence of Temperature on Immature Development, Survival, Longevity, Fecundity, and Gonotrophic Cycles of Aedes albopictus, Vector of Chikungunya and Dengue in the Indian Ocean , 2009, Journal of medical entomology.

[34]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35]  Kenth Engø-Monsen,et al.  Impact of human mobility on the emergence of dengue epidemics in Pakistan , 2015, Proceedings of the National Academy of Sciences.

[36]  George E. P. Box,et al.  Time Series Analysis: Forecasting and Control , 1977 .

[37]  Elisa Mussumeci,et al.  Large-scale multivariate forecasting models for Dengue - LSTM versus random forest regression. , 2020, Spatial and spatio-temporal epidemiology.

[38]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[39]  E. Dinerstein,et al.  The Global 200: Priority ecoregions for global conservation , 2002 .

[40]  R. Baker,et al.  Mechanistic models versus machine learning, a fight worth fighting for the biological community? , 2018, Biology Letters.

[41]  K. Jaroensutasinee,et al.  Distribution, seasonal variation & dengue transmission prediction in Sisaket, Thailand , 2013, The Indian journal of medical research.

[42]  Wei-Chiang Hong Application of seasonal SVR with chaotic immune algorithm in traffic flow forecasting , 2010, Neural Computing and Applications.

[43]  J. H. Huber,et al.  Seasonal temperature variation influences climate suitability for dengue, chikungunya, and Zika transmission , 2017, bioRxiv.

[44]  Jiucheng Xu,et al.  Forecast of Dengue Cases in 20 Chinese Cities Based on the Deep Learning Method , 2020, International journal of environmental research and public health.

[45]  B. A. Harrison,et al.  Effect of temperature on the vector efficiency of Aedes aegypti for dengue 2 virus. , 1987, The American journal of tropical medicine and hygiene.

[46]  Isna Alfi Bustoni,et al.  SARIMA (Seasonal ARIMA) implementation on time series to forecast the number of Malaria incidence , 2013, 2013 International Conference on Information Technology and Electrical Engineering (ICITEE).

[47]  Aneela Zameer,et al.  Predictions for COVID-19 with deep learning models of LSTM, GRU and Bi-LSTM , 2020, Chaos, Solitons & Fractals.

[48]  V. Arankalle,et al.  Co-circulation of all the four dengue virus serotypes and detection of a novel clade of DENV-4 (genotype I) virus in Pune, India during 2016 season , 2018, PloS one.

[49]  Jiuyong Li,et al.  Using causal discovery for feature selection in multivariate numerical time series , 2015, Machine Learning.

[50]  W. Lutz,et al.  Identification of movement synchrony: Validation of windowed cross-lagged correlation and -regression with peak-picking algorithm , 2019, PloS one.

[51]  John S. Brownstein,et al.  The global distribution and burden of dengue , 2013, Nature.

[52]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[53]  Charles F. F. Karney Algorithms for geodesics , 2011, Journal of Geodesy.

[54]  Victor I. Chang,et al.  Applicability of Big Data Techniques to Smart Cities Deployments , 2017, IEEE Transactions on Industrial Informatics.

[55]  Kevin Curran,et al.  OpenStreetMap , 2012, Int. J. Interact. Commun. Syst. Technol..

[56]  Sir Rickard Christophers Aëdes aegypti (L.), the yellow fever mosquito , 1960 .

[57]  Alessandro Vespignani,et al.  Real-Time Forecasting of the COVID-19 Outbreak in Chinese Provinces: Machine Learning Approach Using Novel Digital Data and Estimates From Mechanistic Models , 2020, Journal of medical Internet research.

[58]  D. Fuller,et al.  El Niño Southern Oscillation and vegetation dynamics as predictors of dengue fever cases in Costa Rica , 2009, Environmental research letters : ERL [Web site].

[59]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[60]  Stephanie N Seifert,et al.  Large Diurnal Temperature Fluctuations Negatively Influence Aedes aegypti (Diptera: Culicidae) Life-History Traits , 2013, Journal of medical entomology.