Comparison of national level spatial and spatio-temporal models of malaria

Geospatial statistical models play an important role in malaria control and prevention; they are widely used to produce malaria risk maps, which are essential to guide efficient resource allocation for intervention. Although many models are available for spatial mapping, the most commonly used model in the literature is the Bayesian geostatistical model (BGM), which is based on an underlying Gaussian process. To our knowledge, methods such as splines and decision trees ensemble methods have not been compared relative to their predictive skill for country level malaria prevalence mapping. Moreover, as more countries now have multiple datasets collected throughout the past decade, it is critical to evaluate if the inclusion of past datasets and the use of spatio-temporal models improve the prediction accuracy of present spatial distribution of malaria. Here we compare the prediction accuracy of five models under spatial and spatio-temporal settings in five African countries. The five models are stepwise logistic regression, generalized additive model (GAM), gradient boosted trees (GBM), Bayesian additive regression trees (BART) and the BGM. There is not a single best model to predict malaria prevalence on a national scale. The model performances varied from country to country, and from spatial to spatio-temporal setting. In general, BGM, GAM and BART models performed well, with BGM being the most consistent. The inclusion of past data is not always beneficial: the predictive performance of GAM and GBM increased under spatio-temporal setting, but BGM’s performance decreased in most of the countries. An accurate depiction of malaria risk is important and statistical assumptions that are suitable for a country does not always fit other countries and a wide range of models and settings should be used. It ensures that we find the best modeling approach possible and can provide additional insight to the spatial distribution of malaria risk. Author summary Malaria is still affecting hundreds of millions of people every year, and killing hundreds of thousands. As the majority of malaria intervention and control policies are developed at the national level, accurate spatial prediction of malaria risk is important. Choosing the best modeling approach for prediction is not straightforward. Here we compare the predictive performance of five models in five countries, with and without dataset from multiple surveys, to provide empirical evidence on whether there is a single best model for national level malaria prediction, and whether inclusion of past dataset may be beneficial in predicting current distribution of malaria risk. We find that models’ performances vary from country to country and there is no single best model. Although Bayesian geostatistical model is widely and commonly used in the literatures, its performance is not necessarily superior to other simpler-to-fit methods such as general additive model (splines) and Bayesian additive regression tree. Importantly, we also show that the incorporation of past data does not always improve spatial predictions of current disease risk. Together, this demonstrate the importance of fitting wide range of models as part of the prediction mapping process, instead of relying on a one-size-fit-all model.

[1]  M. Hutchinson,et al.  Splines — more than just a smooth interpolator , 1994 .

[2]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[3]  O. Gaye,et al.  Estimating the Burden of Malaria in Senegal: Bayesian Zero-Inflated Binomial Geostatistical Modeling of the MIS 2008 Data , 2012, PloS one.

[4]  Adam Kapelner,et al.  bartMachine: Machine Learning with Bayesian Additive Regression Trees , 2013, 1312.2171.

[5]  John S. Brownstein,et al.  The global distribution and burden of dengue , 2013, Nature.

[6]  P. Vounatsou,et al.  Geostatistical modelling of malaria indicator survey data to assess the effects of interventions on the geographical distribution of malaria prevalence in children less than 5 years in Uganda , 2017, PloS one.

[7]  David L. Smith,et al.  A World Malaria Map: Plasmodium falciparum Endemicity in 2007 , 2009, PLoS medicine.

[8]  P Vounatsou,et al.  Malaria mapping using transmission models: application to survey data from Mali. , 2006, American journal of epidemiology.

[9]  Jonas Franke,et al.  Geostatistical modelling of the malaria risk in Mozambique: effect of the spatial resolution when using remotely-sensed imagery. , 2015, Geospatial health.

[10]  P. Gething,et al.  Re-examining environmental correlates of Plasmodium falciparum malaria endemicity: a data-intensive variable selection approach , 2015, Malaria Journal.

[11]  H. Rue,et al.  An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach , 2011 .

[12]  Sérgio Freire,et al.  GHS built-up grid, derived from Landsat, multitemporal (1975, 1990, 2000, 2014) , 2015 .

[13]  Haavard Rue,et al.  Spatial modelling with R-INLA: A review , 2018, 1802.06350.

[14]  Seth R Flaxman,et al.  Improved prediction accuracy for disease risk mapping using Gaussian process stacked generalization , 2016, Journal of The Royal Society Interface.

[15]  K. Battle,et al.  A global map of travel time to cities to assess inequalities in accessibility in 2015 , 2018, Nature.

[16]  L Gosoniu,et al.  Bayesian modelling of geostatistical malaria risk data. , 2006, Geospatial health.

[17]  Mevin B Hooten,et al.  The basis function approach for modeling autocorrelation in ecological data. , 2016, Ecology.

[18]  Joshua L. Warren,et al.  Influence of Demographic and Health Survey Point Displacements on Raster-Based Analyses , 2015, Spatial Demography.

[19]  P. Vounatsou,et al.  Malaria risk in Nigeria: Bayesian geostatistical modelling of 2010 malaria indicator survey data , 2015, Malaria Journal.

[20]  Catherine Linard,et al.  The impact of urbanization and population density on childhood Plasmodium falciparum parasite prevalence rates in Africa , 2017, Malaria Journal.

[21]  Virgilio Gómez-Rubio,et al.  Generalized Additive Models: An Introduction with R (2nd Edition) , 2018 .

[22]  U. Dalrymple,et al.  The effect of malaria control on Plasmodium falciparum in Africa between 2000 and 2015 , 2015, Nature.

[23]  Alan Y. Chiang,et al.  Generalized Additive Models: An Introduction With R , 2007, Technometrics.

[24]  Thomas A. Smith,et al.  Mapping malaria risk in West Africa using a Bayesian nonparametric non-stationary model , 2009, Comput. Stat. Data Anal..

[25]  L. Kazembe,et al.  Using Structured Additive Regression Models to Estimate Risk Factors of Malaria: Analysis of 2010 Malawi Malaria Indicator Survey Data , 2014, PloS one.

[26]  P. Vounatsou,et al.  Bayesian Geostatistical Modeling of Malaria Indicator Survey Data in Angola , 2010, PloS one.

[27]  Rebecca Johnson 2011 Year in review--Earth Resources Observation and Science Center , 2012 .

[28]  O. Dubrule Two methods with different objectives: Splines and kriging , 1983 .

[29]  J. Michaelsen,et al.  The climate hazards infrared precipitation with stations—a new environmental record for monitoring extremes , 2015, Scientific Data.

[30]  Forrest R. Stevens,et al.  Gridded Population Maps Informed by Different Built Settlement Products , 2018, Data.

[31]  Finn Lindgren,et al.  Bayesian Spatial Modelling with R-INLA , 2015 .

[32]  Dirk U. Pfeiffer,et al.  Spatial modelling of disease using data- and knowledge-driven approaches. , 2011, Spatial and spatio-temporal epidemiology.