Bias correction in species distribution models: pooling survey and collection data for multiple species

Presence‐only records may provide data on the distributions of rare species, but commonly suffer from large, unknown biases due to their typically haphazard collection schemes. Presence–absence or count data collected in systematic, planned surveys are more reliable but typically less abundant. We proposed a probabilistic model to allow for joint analysis of presence‐only and survey data to exploit their complementary strengths. Our method pools presence‐only and presence–absence data for many species and maximizes a joint likelihood, simultaneously estimating and adjusting for the sampling bias affecting the presence‐only data. By assuming that the sampling bias is the same for all species, we can borrow strength across species to efficiently estimate the bias and improve our inference from presence‐only data. We evaluate our model's performance on data for 36 eucalypt species in south‐eastern Australia. We find that presence‐only records exhibit a strong sampling bias towards the coast and towards Sydney, the largest city. Our data‐pooling technique substantially improves the out‐of‐sample predictive performance of our model when the amount of available presence–absence data for a given species is scarce If we have only presence‐only data and no presence–absence data for a given species, but both types of data for several other species that suffer from the same spatial sampling bias, then our method can obtain an unbiased estimate of the first species' geographic range.

[1]  Erin E Blankenship,et al.  Nondetection sampling bias in marked presence-only data , 2013, Ecology and evolution.

[2]  K. Abromeit Music Received , 2023, Notes.

[3]  Robert M. Dorazio,et al.  Accounting for imperfect detection and survey bias in statistical analysis of presence‐only data , 2014 .

[4]  T. Hastie,et al.  Finite-Sample Equivalence in Statistical Models for Presence-Only Data. , 2012, The annals of applied statistics.

[5]  Mark S. Boyce,et al.  Modelling distribution and abundance with presence‐only data , 2006 .

[6]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[7]  N. Fisher,et al.  Spatial logistic regression and change-of-support in Poisson point processes , 2010 .

[8]  J. Andrew Royle,et al.  Hierarchical Modeling and Inference in Ecology: The Analysis of Data from Populations, Metapopulations and Communities , 2008 .

[9]  Aiyou Chen,et al.  Data enriched linear regression , 2013, 1304.1837.

[10]  X. Guyon,et al.  Statistics for spatial models , 2010 .

[11]  Steven J. Phillips,et al.  Point process models for presence‐only analysis , 2015 .

[12]  T. Yee,et al.  Generalized additive models in plant ecology , 1991 .

[13]  P. Hall,et al.  On blocking rules for the bootstrap with dependent data , 1995 .

[14]  Trevor Hastie,et al.  A statistical explanation of MaxEnt for ecologists , 2011 .

[15]  E. Blankenship,et al.  Correction of location errors for presence‐only species distribution models , 2014 .

[16]  Péter Sólymos,et al.  Conditional likelihood approach for analyzing single visit abundance survey data in the presence of zero inflation and detection error , 2012 .

[17]  D. Warton,et al.  Equivalence of MAXENT and Poisson Point Process Models for Species Distribution Modeling in Ecology , 2013, Biometrics.

[18]  J. Still The Nest , 2014 .

[19]  Daniel J. Nordman,et al.  Optimal Block Size for Variance Estimation by a Spatial Block Bootstrap Method , 2007 .

[20]  L. Vogt Statistics For Spatial Data , 2016 .

[21]  Christophe Giraud,et al.  Capitalising on Opportunistic Data for Monitoring Species Relative Abundances , 2013 .

[22]  Steven J. Phillips,et al.  Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. , 2009, Ecological applications : a publication of the Ecological Society of America.

[23]  Miroslav Dudík,et al.  Correcting sample selection bias in maximum entropy density estimation , 2005, NIPS.

[24]  A. Townsend Peterson,et al.  The influence of spatial errors in species occurrence data used in distribution models , 2007 .

[25]  Romain Julliard,et al.  Capitalising on Opportunistic Data for Monitoring Biodiversity , 2013 .

[26]  S. Richards,et al.  Prevalence, thresholds and the performance of presence–absence models , 2014 .

[27]  J. Andrew Royle,et al.  Likelihood analysis of species occurrence probability from presence‐only data for modelling species distributions , 2012, Methods in Ecology and Evolution.

[28]  Erin M. Bayne,et al.  Dealing with detection error in site occupancy surveys: what can we do with a single survey? , 2012 .

[29]  Robert M Dorazio,et al.  Predicting the Geographic Distribution of a Species from Presence‐Only Data Subject to Detection Errors , 2012, Biometrics.

[30]  Subhash R Lele,et al.  Weighted distributions and estimation of resource selection probability functions. , 2006, Ecology.

[31]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[32]  A. Lehmann,et al.  Predicting species spatial distributions using presence-only data: a case study of native New Zealand ferns , 2002 .

[33]  Avishek Chakraborty,et al.  Point pattern modelling for degraded presence‐only data over large regions , 2011 .

[34]  Noel A Cressie,et al.  Statistics for Spatial Data, Revised Edition. , 1994 .

[35]  Yongtao Guan,et al.  A Thinned Block Bootstrap Variance Estimation Procedure for Inhomogeneous Spatial Point Patterns , 2007 .

[36]  Carlo Gaetan,et al.  Spatial Statistics and Modeling , 2009 .

[37]  J. Fieberg,et al.  Comparative interpretation of count, presence–absence and point methods for species distribution models , 2012 .

[38]  J. Andrew Royle,et al.  ESTIMATING ABUNDANCE FROM REPEATED PRESENCE–ABSENCE DATA OR POINT COUNTS , 2003 .

[39]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[40]  D. Lindenmayer,et al.  Are nest boxes a viable alternative source of cavities for hollow-dependent animals? Long-term monitoring of nest box occupancy, pest use and attrition , 2009 .

[41]  T. Hastie,et al.  Presence‐Only Data and the EM Algorithm , 2009, Biometrics.

[42]  David I. Warton,et al.  Model-Based Control of Observer Bias for the Analysis of Presence-Only Data in Ecology , 2013, PloS one.

[43]  Noel A. C. Cressie,et al.  Statistics for Spatial Data: Cressie/Statistics , 1993 .

[44]  Trevor Hastie,et al.  Inference from presence-only data; the ongoing controversy. , 2013, Ecography.

[45]  D. Warton,et al.  Correction note: Poisson point process models solve the “pseudo-absence problem” for presence-only data in ecology , 2010, 1011.3319.