Where is positional uncertainty a problem for species distribution modelling

Species data held in museum and herbaria, survey data and opportunistically observed data are a substantial information resource. A key challenge in using these data is the uncertainty about where an observation is located. This is important when the data are used for species distribution modelling (SDM), because the coordinates are used to extract the environmental variables and thus, positional error may lead to inaccurate estimation of the species–environment relationship. The magnitude of this effect is related to the level of spatial autocorrelation in the environmental variables. Using local spatial association can be relevant because it can lead to the identification of the specific occurrence records that cause the largest drop in SDM accuracy. Therefore, in this study, we tested whether the SDM predictions are more affected by positional uncertainty originating from locations that have lower local spatial association in their predictors. We performed this experiment for Spain and the Netherlands, using simulated datasets derived from well known species distribution models (SDMs). We used the K statistic to quantify the local spatial association in the predictors at each species occurrence location. A probabilistic approach using Monte Carlo simulations was employed to introduce the error in the species locations. The results revealed that positional uncertainty in species occurrence data at locations with low local spatial association in predictors reduced the prediction accuracy of the SDMs. We propose that local spatial association is a way to identify the species occurrence records that require treatment for positional uncertainty. We also developed and present a tool in the R environment to target observations that are likely to create error in the output from SDMs as a result of positional uncertainty.

[1]  Andrew K. Skidmore,et al.  Finessing atlas data for species distribution models , 2011 .

[2]  M. Graham CONFRONTING MULTICOLLINEARITY IN ECOLOGICAL MULTIPLE REGRESSION , 2003 .

[3]  L. Anselin Local Indicators of Spatial Association—LISA , 2010 .

[4]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[5]  L. Belbin,et al.  Evaluation of statistical models used for predicting plant species distributions: Role of artificial data and theory , 2006 .

[6]  A. Townsend Peterson,et al.  The influence of spatial errors in species occurrence data used in distribution models , 2007 .

[7]  P. Legendre Spatial Autocorrelation: Trouble or New Paradigm? , 1993 .

[8]  J. Keith Ord,et al.  Spatial Processes Models and Applications , 1981 .

[9]  Donald W. Marquaridt Generalized Inverses, Ridge Regression, Biased Linear Estimation, and Nonlinear Estimation , 1970 .

[10]  Miroslav Dudík,et al.  Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation , 2008 .

[11]  A. Peterson,et al.  New developments in museum-based informatics and applications in biodiversity analysis. , 2004, Trends in ecology & evolution.

[12]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[13]  Steven J. Phillips,et al.  WHAT MATTERS FOR PREDICTING THE OCCURRENCES OF TREES: TECHNIQUES, DATA, OR SPECIES' CHARACTERISTICS? , 2007 .

[14]  S. Manel,et al.  Evaluating presence-absence models in ecology: the need to account for prevalence , 2001 .

[15]  Pedro J. Leitão,et al.  Effects of geographical data sampling bias on habitat models of species distributions: a case study with steppe birds in southern Portugal , 2011, Int. J. Geogr. Inf. Sci..

[16]  M. Austin Spatial prediction of species distribution: an interface between ecological theory and statistical modelling , 2002 .

[17]  Robert West,et al.  Generalised Additive Models , 2012 .

[18]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[19]  M. Hutchinson,et al.  The effect of species response form on species distribution model prediction and inference , 2009 .

[20]  Gerard B. M. Heuvelink,et al.  Propagation of errors in spatial modelling with GIS , 1989, Int. J. Geogr. Inf. Sci..

[21]  A. Lehmann,et al.  Predicting species spatial distributions using presence-only data: a case study of native New Zealand ferns , 2002 .

[22]  Jennifer A. Miller,et al.  Mapping Species Distributions: Spatial Inference and Prediction , 2010 .

[23]  R. Real,et al.  AUC: a misleading measure of the performance of predictive distribution models , 2008 .

[24]  Andrew K. Skidmore,et al.  Classification of Kangaroo Habitat Distribution Using Three GIS Models , 1996, Int. J. Geogr. Inf. Sci..

[25]  A. Getis The Analysis of Spatial Association by Use of Distance Statistics , 2010 .

[26]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[27]  Mark New,et al.  Ensemble forecasting of species distributions. , 2007, Trends in ecology & evolution.

[28]  Ali S. Hadi,et al.  Regression Analysis by Example: Chatterjee/Regression , 2006 .

[29]  Pedro J. Leitão,et al.  Effects of species and habitat positional errors on the performance and interpretation of species distribution models , 2009 .

[30]  J. Ord,et al.  Testing for Local Spatial Autocorrelation in the Presence of Global Autocorrelation , 2001 .

[31]  R. Kadmon,et al.  Assessment of alternative approaches for bioclimatic modeling with special emphasis on the Mahalanobis distance , 2003 .

[32]  J. Elith,et al.  Do they? How do they? WHY do they differ? On finding reasons for differing performances of species distribution models , 2009 .

[33]  A. Hirzel,et al.  Assessing habitat-suitability models with a virtual species , 2001 .

[34]  K. Beven,et al.  A physically based, variable contributing area model of basin hydrology , 1979 .

[35]  M. Silman,et al.  Modelling the responses of Andean and Amazonian plant species to climate change: the effects of georeferencing errors and the importance of data filtering , 2010 .

[36]  S. Chatterjee,et al.  Regression Analysis by Example , 1979 .

[37]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[38]  A. Guisan,et al.  Prospective sampling based on model ensembles improves the detection of rare species , 2010 .

[39]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .

[40]  T. Groen,et al.  Spatial autocorrelation in predictors reduces the impact of positional uncertainty in occurrence data on species distribution modelling , 2011 .

[41]  Robert P. Anderson,et al.  Ecological Niches and Geographic Distributions , 2011 .

[42]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[43]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[44]  C. Meynard,et al.  Predicting species distributions: a critical comparison of the most common statistical models using artificial species , 2007 .

[45]  Arthur Chapman,et al.  © 2005, Global Biodiversity Information Facility Material in this publication is free to use, with proper attribution. Recommended citation format: Chapman, A. D. 2005. Principles of Data Quality, version 1.0. Report for the Global Biodiversity Information Facility, Copenhagen. , 2005 .

[46]  Peter M. Atkinson,et al.  A per-pixel, non-stationary mixed model for empirical line atmospheric correction in remote sensing , 2012 .

[47]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[48]  Qinghua Guo,et al.  Please Scroll down for Article International Journal of Geographical Information Science Georeferencing Locality Descriptions and Computing Associated Uncertainty Using a Probabilistic Approach Georeferencing Locality Descriptions and Computing Associated Uncertainty Using a Probabilistic Approach , 2022 .

[49]  A. Guisan,et al.  An improved approach for predicting the distribution of rare and endangered species from occurrence and pseudo-absence data , 2004 .

[50]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[51]  Carsten Rahbek,et al.  The patterns and causes of elevational diversity gradients , 2012 .

[52]  Mathieu Marmion,et al.  The performance of state-of-the-art modelling techniques depends on geographical distribution of species. , 2009 .

[53]  Anne Lohrli Chapman and Hall , 1985 .

[54]  Peter M. Atkinson,et al.  On the effect of positional uncertainty in field measurements on the atmospheric correction of remotely sensed imagery , 2004 .

[55]  P. Hernandez,et al.  The effect of sample size and species characteristics on performance of different species distribution modeling methods , 2006 .

[56]  N. Bystriakova,et al.  Sampling bias in geographic and environmental space and its effect on the predictive power of species distribution models , 2012 .

[57]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[58]  Qinghua Guo,et al.  The point-radius method for georeferencing locality descriptions and calculating associated uncertainty , 2004, Int. J. Geogr. Inf. Sci..

[59]  M. Batty,et al.  Spatial Analysis: Modelling in a GIS Environment , 1998 .

[60]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[61]  M. Araújo,et al.  BIOMOD – a platform for ensemble forecasting of species distributions , 2009 .

[62]  M. Araújo,et al.  Five (or so) challenges for species distribution modelling , 2006 .

[63]  W. Jetz,et al.  Effects of species’ ecology on the accuracy of distribution models , 2007 .

[64]  John Bell,et al.  A review of methods for the assessment of prediction errors in conservation presence/absence models , 1997, Environmental Conservation.

[65]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[66]  Damaris Zurell,et al.  Collinearity: a review of methods to deal with it and a simulation study evaluating their performance , 2013 .

[67]  J. Lobo,et al.  The effect of prevalence and its interaction with sample size on the reliability of species distribution models , 2009 .

[68]  Carolyn L. Rose,et al.  Preserving natural science collections: chronicle of our environmental heritage , 1993 .

[69]  Jaime A. Chaves,et al.  Predicting species distributions across the Amazonian and Andean regions using remote sensing data , 2008 .

[70]  Alberto Jiménez-Valverde,et al.  The uncertain nature of absences and their importance in species distribution modelling , 2010 .

[71]  Kurt Hornik,et al.  Support Vector Machines in R , 2006 .

[72]  J. Ord,et al.  Local Spatial Autocorrelation Statistics: Distributional Issues and an Application , 2010 .