A novel model for malaria prediction based on ensemble algorithms

Background and objective Most previous studies adopted single traditional time series models to predict incidences of malaria. A single model cannot effectively capture all the properties of the data structure. However, a stacking architecture can solve this problem by combining distinct algorithms and models. This study compares the performance of traditional time series models and deep learning algorithms in malaria case prediction and explores the application value of stacking methods in the field of infectious disease prediction. Methods The ARIMA, STL+ARIMA, BP-ANN and LSTM network models were separately applied in simulations using malaria data and meteorological data in Yunnan Province from 2011 to 2017. We compared the predictive performance of each model through evaluation measures: RMSE, MASE, MAD. In addition, gradient-boosting regression trees (GBRTs) were used to combine the above four models. We also determined whether stacking structure improved the model prediction performance. Results The root mean square errors (RMSEs) of the four sub-models were 13.176, 14.543, 9.571 and 7.208; the mean absolute scaled errors (MASEs) were 0.469, 0.472, 0.296 and 0.266 and the mean absolute deviation (MAD) were 6.403, 7.658, 5.871 and 5.691. After using the stacking architecture combined with the above four models, the RMSE, MASE and MAD values of the ensemble model decreased to 6.810, 0.224 and 4.625, respectively. Conclusions A novel ensemble model based on the robustness of structured prediction and model combination through stacking was developed. The findings suggest that the predictive performance of the final model is superior to that of the other four sub-models, indicating that stacking architecture may have significant implications in infectious disease prediction.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Jaranit Kaewkungwal,et al.  Development of temporal modelling for forecasting and prediction of malaria infections using time-series and ARIMAX analyses: A case study in endemic districts of Bhutan , 2010, Malaria Journal.

[3]  Andrew Janowczyk,et al.  Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases , 2016, Journal of pathology informatics.

[4]  Ran Li,et al.  Deep Learning for Household Load Forecasting—A Novel Pooling Deep RNN , 2018, IEEE Transactions on Smart Grid.

[5]  D. R. A. Rambli,et al.  Performance of univariate forecasting on seasonal diseases: the case of tuberculosis. , 2011, Advances in experimental medicine and biology.

[6]  Suntae Hwang,et al.  Application of an artificial neural network (ANN) model for predicting mosquito abundances in urban areas , 2016, Ecol. Informatics.

[7]  B. Rost,et al.  Conservation and prediction of solvent accessibility in protein families , 1994, Proteins.

[8]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[9]  David L. Smith,et al.  Seasonality of Plasmodium falciparum transmission: a systematic review , 2015, Malaria Journal.

[10]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[11]  Saso Dzeroski,et al.  Combining Classifiers with Meta Decision Trees , 2003, Machine Learning.

[12]  I. Kleinschmidt,et al.  Exploring 30 years of malaria case data in KwaZulu‐Natal, South Africa: Part I. The impact of climatic factors , 2004, Tropical medicine & international health : TM & IH.

[13]  Jyotishman Pathak,et al.  Ensemble learning approaches to predicting complications of blood transfusion , 2015, 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[14]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[15]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[16]  Rubén Urraca,et al.  Stacking ensemble with parsimonious base models to improve generalization capability in the characterization of steel bolted components , 2018, Appl. Soft Comput..

[17]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[18]  Lin Gui,et al.  EL_LSTM: Prediction of DNA-Binding Residue from Protein Sequence by Combining Long Short-Term Memory and Ensemble Learning , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1972 .

[20]  Yi Xiong,et al.  An accurate feature‐based method for identifying DNA‐binding residues on protein surfaces , 2011, Proteins.

[21]  Seth R Flaxman,et al.  Improved prediction accuracy for disease risk mapping using Gaussian process stacked generalization , 2016, Journal of The Royal Society Interface.

[22]  F F Nobre,et al.  Dynamic linear model and SARIMA: a comparison of their forecasting performance in epidemiology , 2001, Statistics in medicine.

[23]  Jotun Hein,et al.  A nucleotide substitution model with nearest-neighbour interactions , 2004, ISMB/ECCB.

[24]  Ruobing Wang,et al.  Significantly Improving the Prediction of Molecular Atomization Energies by an Ensemble of Machine Learning Algorithms and Rescanning Input Space: A Stacked Generalization Approach , 2018 .

[25]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[26]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[27]  Rob J Hyndman,et al.  Another look at measures of forecast accuracy , 2006 .

[28]  A. Githeko,et al.  Association between climate variability and malaria epidemics in the East African highlands. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[29]  I. Jeanne,et al.  Malaria early warning tool: linking inter-annual climate and malaria variability in northern Guadalcanal, Solomon Islands , 2017, Malaria Journal.

[30]  H. Lewis,et al.  Kinetic stability of large-scale MHD modes☆ , 1988 .

[31]  Zhiqiang Deng,et al.  Development of artificial intelligence approach to forecasting oyster norovirus outbreaks along Gulf of Mexico coast. , 2018, Environment international.

[32]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[33]  Michael Y. Hu,et al.  Forecasting with artificial neural networks: The state of the art , 1997 .

[34]  Guoqiang Peter Zhang,et al.  Time series forecasting using a hybrid ARIMA and neural network model , 2003, Neurocomputing.

[35]  Joel Schwartz,et al.  Weather-based prediction of Plasmodium falciparum malaria in epidemic-prone regions of Ethiopia I. Patterns of lagged weather effects reflect biological mechanisms , 2004, Malaria Journal.

[36]  Shilu Tong,et al.  Climatic variables and transmission of malaria: a 12-year data analysis in Shuchen County, China. , 2003, Public health reports.

[37]  S. Tong,et al.  Development of an empirical model to predict malaria outbreaks based on monthly case reports and climate variables in Hefei, China, 1990-2011. , 2018, Acta tropica.

[38]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[39]  Jun Wang,et al.  Forecasting stochastic neural network based on financial empirical mode decomposition , 2017, Neural Networks.

[40]  U Helfenstein,et al.  The use of transfer function models, intervention analysis and related time series methods in epidemiology. , 1991, International journal of epidemiology.

[41]  Yunpeng Wang,et al.  Long short-term memory neural network for traffic speed prediction using remote microwave sensor data , 2015 .

[42]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[43]  Mevlut Ture,et al.  Comparison of four different time series methods to forecast hepatitis A virus infection , 2006, Expert Syst. Appl..

[44]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[45]  Sangwon Chae,et al.  Predicting Infectious Disease Using Deep Learning and Big Data , 2018, International journal of environmental research and public health.

[46]  A Seasonal Autoregressive Integrated Moving Average (SARIMA) forecasting model to predict monthly malaria cases in KwaZulu-Natal, South Africa. , 2018, South African medical journal = Suid-Afrikaanse tydskrif vir geneeskunde.