Integrated species distribution models: combining presence‐background data and site‐occupancy data with imperfect detection

Summary Two main sources of data for species distribution models (SDMs) are site-occupancy (SO) data from planned surveys, and presence-background (PB) data from opportunistic surveys and other sources. SO surveys give high quality data about presences and absences of the species in a particular area. However, due to their high cost, they often cover a smaller area relative to PB data, and are usually not representative of the geographic range of a species. In contrast, PB data is plentiful, covers a larger area, but is less reliable due to the lack of information on species absences, and is usually characterised by biased sampling. Here we present a new approach for species distribution modelling that integrates these two data types. We have used an inhomogeneous Poisson point process as the basis for constructing an integrated SDM that fits both PB and SO data simultaneously. It is the first implementation of an Integrated SO–PB Model which uses repeated survey occupancy data and also incorporates detection probability. The Integrated Model's performance was evaluated, using simulated data and compared to approaches using PB or SO data alone. It was found to be superior, improving the predictions of species spatial distributions, even when SO data is sparse and collected in a limited area. The Integrated Model was also found effective when environmental covariates were significantly correlated. Our method was demonstrated with real SO and PB data for the Yellow-bellied glider (Petaurus australis) in south-eastern Australia, with the predictive performance of the Integrated Model again found to be superior. PB models are known to produce biased estimates of species occupancy or abundance. The small sample size of SO datasets often results in poor out-of-sample predictions. Integrated models combine data from these two sources, providing superior predictions of species abundance compared to using either data source alone. Unlike conventional SDMs which have restrictive scale-dependence in their predictions, our Integrated Model is based on a point process model and has no such scale-dependency. It may be used for predictions of abundance at any spatial-scale while still maintaining the underlying relationship between abundance and area.

[1]  Brett J Furnas,et al.  Detecting diversity: emerging methods to estimate species diversity. , 2014, Trends in ecology & evolution.

[2]  Mark S. Boyce,et al.  Modelling distribution and abundance with presence‐only data , 2006 .

[3]  Steven J. Phillips,et al.  Point process models for presence‐only analysis , 2015 .

[4]  W. Jetz,et al.  Uncertainty, priors, autocorrelation and disparate data in downscaling of species distributions , 2014 .

[5]  J. Andrew Royle,et al.  ESTIMATING SITE OCCUPANCY RATES WHEN DETECTION PROBABILITIES ARE LESS THAN ONE , 2002, Ecology.

[6]  J. Elith,et al.  Species Distribution Models: Ecological Explanation and Prediction Across Space and Time , 2009 .

[7]  T. Hastie,et al.  Bias correction in species distribution models: pooling survey and collection data for multiple species , 2014, Methods in ecology and evolution.

[8]  D. Warton,et al.  Equivalence of MAXENT and Poisson Point Process Models for Species Distribution Modeling in Ecology , 2013, Biometrics.

[9]  T. Hastie,et al.  Finite-Sample Equivalence in Statistical Models for Presence-Only Data. , 2012, The annals of applied statistics.

[10]  Robert M Dorazio,et al.  Predicting the Geographic Distribution of a Species from Presence‐Only Data Subject to Detection Errors , 2012, Biometrics.

[11]  R. Fletcher,et al.  Integrated models that unite local and regional data reveal larger-scale environmental relationships and improve predictions of species distributions , 2016, Landscape Ecology.

[12]  Alastair Scott,et al.  Fitting binary regression models with case-augmented samples , 2006 .

[13]  Trevor Hastie,et al.  Inference from presence-only data; the ongoing controversy. , 2013, Ecography.

[14]  D. Warton,et al.  Correction note: Poisson point process models solve the “pseudo-absence problem” for presence-only data in ecology , 2010, 1011.3319.

[15]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[16]  M. Kéry,et al.  Predicting species distributions from checklist data using site‐occupancy models , 2010 .

[17]  Avishek Chakraborty,et al.  Point pattern modelling for degraded presence‐only data over large regions , 2011 .

[18]  M. Hooten,et al.  Hierarchical Species Distribution Models , 2016 .

[19]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[20]  Damaris Zurell,et al.  Collinearity: a review of methods to deal with it and a simulation study evaluating their performance , 2013 .

[21]  Brendan A. Wintle,et al.  Is my species distribution model fit for purpose? Matching data and models to applications , 2015 .

[22]  H. Possingham,et al.  IMPROVING PRECISION AND REDUCING BIAS IN BIOLOGICAL SURVEYS: ESTIMATING FALSE‐NEGATIVE ERROR RATES , 2003 .

[23]  Robert M. Dorazio,et al.  Accounting for imperfect detection and survey bias in statistical analysis of presence‐only data , 2014 .

[24]  José J. Lahoz-Monfort,et al.  Ignoring Imperfect Detection in Biological Surveys Is Dangerous: A Response to ‘Fitting and Interpreting Occupancy Models' , 2014, PloS one.