sampbias, a method for quantifying geographic sampling biases in species distribution data

Geo-referenced species occurrences from public databases have become essential to biodiversity research and conservation. However, geographical biases are widely recognized as a factor limiting the usefulness of such data for understanding species diversity and distribution. In particular, differences in sampling intensity across a landscape due to differences in human accessibility are ubiquitous but may differ in strength among taxonomic groups and datasets. Although several factors have been described to influence human access (such as presence of roads, rivers, airports and cities), quantifying their specific and combined effects on recorded occurrence data remains challenging. Here we present sampbias, an algorithm and software for quantifying the effect of accessibility biases in species occurrence datasets. Sampbias uses a Bayesian approach to estimate how sampling rates vary as a function of proximity to one or multiple bias factors. The results are comparable among bias factors and datasets. We demonstrate the use of sampbias on a dataset of mammal occurrences from the island of Borneo, showing a high biasing effect of cities and a moderate effect of roads and airports. Sampbias is implemented as a well-documented, open-access and user-friendly R package that we hope will become a standard tool for anyone working with species occurrences in ecology, evolution, conservation and related fields.

[1]  R. Irizarry ggplot2 , 2019, Introduction to Data Science.

[2]  Sverker C. Jagers,et al.  Linking democracy and biodiversity conservation: Empirical evidence and research gaps , 2019, Ambio.

[3]  G. Kerley,et al.  Accessibility maps as a tool to predict sampling bias in historical biodiversity occurrence records , 2018, Ecography.

[4]  Timothy J. S. Whitfeld,et al.  Widespread sampling biases in herbaria revealed from large-scale digitization , 2017, bioRxiv.

[5]  Carsten Meyer,et al.  Multidimensional biases, gaps and uncertainties in global plant occurrence information. , 2016, Ecology letters.

[6]  E. Pebesma,et al.  Classes and Methods for Spatial Data , 2015 .

[7]  Walter Jetz,et al.  Global priorities for an effective information basis of biodiversity distributions , 2015, Nature Communications.

[8]  Alejandro Ruete,et al.  Displaying bias in sampling effort of data accessed from biodiversity databases using ignorance maps , 2015, Biodiversity data journal.

[9]  Michael J. O. Pocock,et al.  Bias and information in biological records , 2015 .

[10]  S. Nielsen,et al.  Accounting for spatially biased sampling effort in presence‐only species distribution modelling , 2015 .

[11]  Hideyasu Shimadzu,et al.  Attenuation of species abundance distributions by sampling , 2015, Royal Society Open Science.

[12]  Daniel Fernández,et al.  Estimation of spatial sampling effort based on presence-only data and accessibility , 2015 .

[13]  Brody Sandel,et al.  Limited sampling hampers “big data” estimation of species richness in a tropical biodiversity hotspot , 2015, Ecology and evolution.

[14]  T. Hastie,et al.  Bias correction in species distribution models: pooling survey and collection data for multiple species , 2014, Methods in ecology and evolution.

[15]  J. Andrew Royle,et al.  Distribution, Abundance, and Species Richness in Ecology , 2015 .

[16]  J. Andrew Royle,et al.  Applied Hierarchical Modeling in Ecology: Analysis of Distribution, Abundance and Species Richness in R and BUGS , 2015 .

[17]  Neville D. Crossman,et al.  Uncertainty analysis of crowd-sourced and professionally collected field data used in species distribution models of Taiwanese moths , 2015 .

[18]  Keping Ma,et al.  PAPER Environmental and socio-economic factors shaping the geography of floristic collections in China , 2014 .

[19]  Robert P. Anderson,et al.  Environmental filters reduce the effects of sampling bias and improve predictions of ecological niche models , 2014 .

[20]  J. Engler,et al.  Mapping Species Distributions with MAXENT Using a Geographically Biased Sample of Presence Data: A Performance Assessment of Methods for Correcting Sampling Bias , 2014, PloS one.

[21]  Robert A. Boria,et al.  Spatial filtering to reduce sampling bias can improve the performance of ecological niche models , 2014 .

[22]  Dan L. Warren,et al.  Incorporating model complexity and spatial sampling bias into ecological niche models of climate change risks faced by 90 California vertebrate species of concern , 2014 .

[23]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[24]  Wolfgang Schwanghart,et al.  Spatial bias in the GBIF database and its effect on modeling species' geographic distributions , 2014, Ecol. Informatics.

[25]  Boris Schröder,et al.  The importance of correcting for sampling bias in MaxEnt species distribution models , 2013 .

[26]  A. Márcia Barbosa,et al.  Species–people correlations and the need to account for survey effort in biodiversity analyses , 2013 .

[27]  Keping Ma,et al.  Geographical sampling bias in a large distributional database and its effects on species richness–environment models , 2013 .

[28]  Edzer J. Pebesma,et al.  Applied Spatial Data Analysis with R - Second Edition , 2008, Use R!.

[29]  M. M. Vale,et al.  Across‐taxa incongruence in patterns of collecting bias , 2012 .

[30]  C. Ricotta,et al.  Accounting for uncertainty when mapping species distributions: The need for maps of ignorance , 2011 .

[31]  Pedro M. Valero-Mora,et al.  ggplot2: Elegant Graphics for Data Analysis , 2010 .

[32]  Georgina M. Mace,et al.  Distorted Views of Biodiversity: Spatial and Temporal Bias in Species Occurrence Data , 2010, PLoS biology.

[33]  B. Erasmus,et al.  Geographic sampling bias in the South African Frog Atlas Project: implications for conservation planning , 2010, Biodiversity and Conservation.

[34]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[35]  Daniel Fernández,et al.  ESTIMATION OF SPATIAL SAMPLING EFFORT BASED ON PRESENCE-ONLY DATA OVER A CLASS OF SPECIES , 2008 .

[36]  R. Kadmon,et al.  EFFECT OF ROADSIDE BIAS ON THE ACCURACY OF PREDICTIVE MAPS PRODUCED BY BIOCLIMATIC MODELS , 2004 .

[37]  Z. Huaman,et al.  Assessing the Geographic Representativeness of Genebank Collections: the Case of Bolivian Wild Potatoes , 2000, Conservation biology : the journal of the Society for Conservation Biology.