Evaluating Bayesian spatial methods for modelling species distributions with clumped and restricted occurrence data

Statistical approaches for inferring the spatial distribution of taxa (Species Distribution Models, SDMs) commonly rely on available occurrence data, which is often clumped and geographically restricted. Although available SDM methods address some of these factors, they could be more directly and accurately modelled using a spatially-explicit approach. Software to fit models with spatial autocorrelation parameters in SDMs are now widely available, but whether such approaches for inferring SDMs aid predictions compared to other methodologies is unknown. Here, within a simulated environment using 1000 generated species’ ranges, we compared the performance of two commonly used non-spatial SDM methods (Maximum Entropy Modelling, MAXENT and boosted regression trees, BRT), to a spatial Bayesian SDM method (fitted using R-INLA), when the underlying data exhibit varying combinations of clumping and geographic restriction. Finally, we tested how any recommended methodological settings designed to account for spatially non-random patterns in the data impact inference. Spatial Bayesian SDM method was the most consistently accurate method, being in the top 2 most accurate methods in 7 out of 8 data sampling scenarios. Within high-coverage sample datasets, all methods performed fairly similarly. When sampling points were randomly spread, BRT had a 1–3% greater accuracy over the other methods and when samples were clumped, the spatial Bayesian SDM method had a 4%-8% better AUC score. Alternatively, when sampling points were restricted to a small section of the true range all methods were on average 10–12% less accurate, with greater variation among the methods. Model inference under the recommended settings to account for autocorrelation was not impacted by clumping or restriction of data, except for the complexity of the spatial regression term in the spatial Bayesian model. Methods, such as those made available by R-INLA, can be successfully used to account for spatial autocorrelation in an SDM context and, by taking account of random effects, produce outputs that can better elucidate the role of covariates in predicting species occurrence. Given that it is often unclear what the drivers are behind data clumping in an empirical occurrence dataset, or indeed how geographically restricted these data are, spatially-explicit Bayesian SDMs may be the better choice when modelling the spatial distribution of target species.

[1]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[2]  Virgilio Gómez-Rubio,et al.  Spatial Point Patterns: Methodology and Applications with R , 2016 .

[3]  Antoine Guisan,et al.  Species distribution models reveal apparent competitive and facilitative effects of a dominant species on the distribution of tundra plants , 2010 .

[4]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[5]  A. Peterson,et al.  Ecologic Niche Modeling and Spatial Patterns of Disease Transmission , 2006, Emerging infectious diseases.

[6]  J. Lobo,et al.  Historical bias in biodiversity inventories affects the observed environmental niche of the species , 2008 .

[7]  Trevor H. Booth,et al.  bioclim: the first species distribution modelling package, its early applications and relevance to most current MaxEnt studies , 2014 .

[8]  Steven J. Phillips,et al.  Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. , 2009, Ecological applications : a publication of the Ecological Society of America.

[9]  Kate E. Jones,et al.  Spatial, seasonal and climatic predictive models of Rift Valley fever disease across Africa , 2017, Philosophical Transactions of the Royal Society B: Biological Sciences.

[10]  A. Ellison,et al.  Should species distribution models account for spatial autocorrelation? A test of model projections across eight millennia of climate change , 2013 .

[11]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[12]  Haavard Rue,et al.  Bayesian Computing with INLA: A Review , 2016, 1604.00860.

[13]  Trevor Hastie,et al.  A statistical explanation of MaxEnt for ecologists , 2011 .

[14]  J. Drake Range bagging: a new method for ecological niche modelling from presence-only data , 2015, Journal of The Royal Society Interface.

[15]  J. Alvar,et al.  Complexities of Assessing the Disease Burden Attributable to Leishmaniasis , 2008, PLoS neglected tropical diseases.

[16]  Eve McDonald-Madden,et al.  Predicting species distributions for conservation decisions , 2013, Ecology letters.

[17]  Matthew J. Smith,et al.  The Effects of Sampling Bias and Model Complexity on the Predictive Performance of MaxEnt Species Distribution Models , 2013, PloS one.

[18]  H. Rue,et al.  An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach , 2011 .

[19]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[20]  Robert P. Anderson,et al.  Species-specific tuning increases robustness to sampling bias in models of species distributions: An implementation with Maxent , 2011 .

[21]  Wolfgang Schwanghart,et al.  Spatial bias in the GBIF database and its effect on modeling species' geographic distributions , 2014, Ecol. Informatics.

[22]  J. Andrew Royle,et al.  Presence‐only modelling using MAXENT: when can we trust the inferences? , 2013 .

[23]  Finn Lindgren,et al.  Bayesian Spatial Modelling with R-INLA , 2015 .

[24]  Mevin B Hooten,et al.  The basis function approach for modeling autocorrelation in ecological data. , 2016, Ecology.

[25]  Wilfried Thuiller,et al.  From species distributions to meta-communities. , 2015, Ecology letters.

[26]  Jennifer A. Miller Species distribution models , 2012 .

[27]  David I. Warton,et al.  Model-Based Control of Observer Bias for the Analysis of Presence-Only Data in Ecology , 2013, PloS one.

[28]  Francis K. C. Hui,et al.  So Many Variables: Joint Modeling in Community Ecology. , 2015, Trends in ecology & evolution.

[29]  P. Diggle,et al.  Geostatistical inference under preferential sampling , 2010 .

[30]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[31]  J. Andrew Royle,et al.  Likelihood analysis of species occurrence probability from presence‐only data for modelling species distributions , 2012, Methods in Ecology and Evolution.

[32]  B. Wintle,et al.  Incorporating spatial autocorrelation into species distribution models alters forecasts of climate‐mediated range shifts , 2014, Global change biology.

[33]  Georgina M. Mace,et al.  Distorted Views of Biodiversity: Spatial and Temporal Bias in Species Occurrence Data , 2010, PLoS biology.

[34]  J. Engler,et al.  Mapping Species Distributions with MAXENT Using a Geographically Biased Sample of Presence Data: A Performance Assessment of Methods for Correcting Sampling Bias , 2014, PloS one.

[35]  Håvard Rue,et al.  Hierarchical analysis of spatially autocorrelated ecological data using integrated nested Laplace approximation , 2012 .

[36]  S. Nielsen,et al.  Accounting for spatially biased sampling effort in presence‐only species distribution modelling , 2015 .

[37]  L. Moses,et al.  Environmental‐mechanistic modelling of the impact of global change on human zoonotic disease emergence: a case study of Lassa fever , 2016 .

[38]  Haavard Rue,et al.  Going off grid: computationally efficient inference for log-Gaussian Cox processes , 2016 .

[39]  Nick Golding,et al.  Fast and flexible Bayesian species distribution modelling using Gaussian processes , 2016 .

[40]  A. Peterson,et al.  No silver bullets in correlative ecological niche modelling: insights from testing among many potential algorithms for niche estimation , 2015 .

[41]  Mevin B Hooten,et al.  Practical guidance on characterizing availability in resource selection functions under a use-availability design. , 2013, Ecology.

[42]  D. Warton,et al.  Equivalence of MAXENT and Poisson Point Process Models for Species Distribution Modeling in Ecology , 2013, Biometrics.

[43]  Finn Lindgren,et al.  Bayesian computing with INLA: New features , 2012, Comput. Stat. Data Anal..

[44]  Jennifer A. Miller Species Distribution Modeling , 2010 .