Re-examining environmental correlates of Plasmodium falciparum malaria endemicity: a data-intensive variable selection approach

BackgroundMalaria risk maps play an increasingly important role in disease control planning, implementation, and evaluation. The construction of these maps using modern geospatial techniques relies on covariate grids: continuous surfaces quantifying environmental factors that partially explain spatial heterogeneity in malaria endemicity. Although crucial, past variable selection processes for this purpose have often been subjective and ad-hoc, with many covariates used in modeling with little quantitative justification.MethodsThis research consists of an extensive covariate construction and selection process for predicting Plasmodium falciparum parasite rates (PfPR) in Africa for years 2000-2012. First, a literature review was conducted to establish a comprehensive list of covariates used for malaria mapping. Second, a library of covariate data was assembled to reflect this list, a process that included the construction of multiple, temporally dynamic datasets. Third, the resulting set of covariates was leveraged to create more than 50 million possible covariate terms via factorial combinations of different spatial and temporal aggregations, transformations, and pairwise interactions. Fourth, the expanded set of covariates was reduced via successive selection criteria to yield a robust covariate subset that was assessed using an out-of-sample validation approach.ResultsThe final covariate subset included predominately dynamic covariates and it substantially out-performed earlier sets used by the Malaria Atlas Project (MAP) for creating global malaria risk maps, with the pseudo-R2 value for the out-of-sample validation increasing from 0.43 to 0.52. Dynamic covariates improved the model, with 17 of the 20 new covariates consisting of monthly or annual products, but the selected covariates were typically interaction terms that included both dynamic and synoptic datasets. Thus the interplay between normal (i.e., long-term averages) and immediate conditions may be key for characterizing environmental controls on parasite rate.ConclusionsThis analysis represents the first effort to systematically audit covariate utility for malaria mapping and then derive an objective, empirically based set of environmental covariates for modeling PfPR. The new covariates produce more reliable representations of malaria risk patterns and how they are changing through time, and these covariates will be used to characterize spatially and temporally varying environmental conditions affecting PfPR within a geostatistical-modeling framework, thus building upon previous research by MAP that produced global malaria maps for 2007 and 2010.

[1]  D. McFadden Conditional logit analysis of qualitative choice behavior , 1972 .

[2]  P. Zarembka Frontiers in econometrics , 1973 .

[3]  H. Akaike A new look at the statistical model identification , 1974 .

[4]  R. Kauth,et al.  The tasselled cap - A graphic description of the spectral-temporal development of agricultural crops as seen by Landsat , 1976 .

[5]  K. Beven,et al.  A physically based, variable contributing area model of basin hydrology , 1979 .

[6]  G. D. Paterson,et al.  THE ANALYSIS OF MORTALITY AND SURVIVAL RATES IN WILD POPULATION OF MOSQUITOES , 1981 .

[7]  M. Hugh-jones,et al.  Applications of remote sensing to the identification of the habitats of parasites and disease vectors. , 1989, Parasitology today.

[8]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[9]  B. Gao NDWI—A normalized difference water index for remote sensing of vegetation liquid water from space , 1996 .

[10]  A. Huete,et al.  MODIS VEGETATION INDEX ( MOD 13 ) ALGORITHM THEORETICAL BASIS DOCUMENT Version 3 . 1 Principal Investigators , 1999 .

[11]  Thomas A. Hennig,et al.  The Shuttle Radar Topography Mission , 2001, Digital Earth Moving.

[12]  Zhao-Liang Li,et al.  Validation of the land-surface temperature products retrieved from Terra Moderate Resolution Imaging Spectroradiometer data , 2002 .

[13]  E. Dinerstein,et al.  The Global 200: Priority ecoregions for global conservation , 2002 .

[14]  A. Gelfand,et al.  Proper multivariate conditional autoregressive models for spatial data analysis. , 2003, Biostatistics.

[15]  J. L. Parra,et al.  Very high resolution interpolated climate surfaces for global land areas , 2005 .

[16]  S. Hay,et al.  A global assessment of closed forests, deforestation and malaria risk. , 2006, Annals of tropical medicine and parasitology.

[17]  A. Tatem,et al.  Global environmental data for mapping infectious disease distribution. , 2006, Advances in parasitology.

[18]  David L. Smith,et al.  Standardizing estimates of the Plasmodium falciparum parasite rate , 2007, Malaria Journal.

[19]  A. Tatem,et al.  High Resolution Population Maps for Low Income Nations: Combining Land Cover and Census in East Africa , 2007, PloS one.

[20]  W. Cohen,et al.  MODIS tasselled cap: land cover characteristics expressed through transformed MODIS data , 2007 .

[21]  P. Diggle,et al.  Model‐based geostatistics , 2007 .

[22]  Caroline W. Kabaria,et al.  Human population, urban settlement patterns and their impact on Plasmodium falciparum malaria endemicity , 2008, Malaria Journal.

[23]  A. Nelson,et al.  Travel time to major cities: a global map of accessibility , 2008 .

[24]  A. Tatem,et al.  Using remotely sensed night-time light as a proxy for poverty in Africa , 2008, Population health metrics.

[25]  David L. Smith,et al.  A World Malaria Map: Plasmodium falciparum Endemicity in 2007 , 2009, PLoS medicine.

[26]  Andrew J Tatem,et al.  Correction: A World Malaria Map: Plasmodium falciparum Endemicity in 2007 , 2009, PLoS Medicine.

[27]  Michael E. Schaepman,et al.  Algorithm theoretical basis document , 2009 .

[28]  Simon I. Hay,et al.  Quantifying Aggregated Uncertainty in Plasmodium falciparum Malaria Prevalence and Populations at Risk via Efficient Space-Time Geostatistical Joint Simulation , 2010, PLoS Comput. Biol..

[29]  Damien Sulla-Menashe,et al.  MODIS Collection 5 global land cover: Algorithm refinements and characterization of new datasets , 2010 .

[30]  David L. Smith,et al.  A new world malaria map: Plasmodium falciparum endemicity in 2010 , 2011, Malaria Journal.

[31]  Robert L. Crabtree,et al.  Percent surface water estimation from MODIS BRDF 16-day image composites , 2011 .

[32]  David L. Smith,et al.  A Long Neglected World Malaria Map: Plasmodium vivax Endemicity in 2010 , 2012, PLoS neglected tropical diseases.

[33]  Damaris Zurell,et al.  Collinearity: a review of methods to deal with it and a simulation study evaluating their performance , 2013 .

[34]  S. Hay,et al.  Providing open access data online to advance malaria research and control , 2013, Malaria Journal.

[35]  David L. Smith,et al.  Declining malaria in Africa: improving the measurement of progress , 2014, Malaria Journal.

[36]  R. Cibulskis,et al.  World Malaria Report 2013 , 2014 .

[37]  Peter M. Atkinson,et al.  An effective approach for gap-filling continental scale remotely sensed time-series , 2014, ISPRS journal of photogrammetry and remote sensing : official publication of the International Society for Photogrammetry and Remote Sensing.

[38]  David L. Smith,et al.  Air temperature suitability for Plasmodium falciparum malaria transmission in Africa 2000-2012: a high-resolution spatiotemporal prediction , 2014, Malaria Journal.

[39]  John M. Miller,et al.  A methodological framework for the improved use of routine health system data to evaluate national malaria control programs: evidence from Zambia , 2014, Population Health Metrics.