Correction note: Poisson point process models solve the “pseudo-absence problem” for presence-only data in ecology

Presence-only data, point locations where a species has beenrecorded as being present, are often used in modeling the distribu-tion of a species as a function of a set of explanatory variables—whether to map species occurrence, to understand its associationwith the environment, or to predict its response to environmentalchange. Currently, ecologists most commonly analyze presence-onlydata by adding randomly chosen “pseudo-absences” to the data suchthat it can be analyzed using logistic regression, an approach whichhas weaknesses in model specification, in interpretation, and in imple-mentation. To address these issues, we propose Poisson point processmodeling of the intensity of presences. We also derive a link betweenthe proposed approach and logistic regression—specifically, we showthat as the number of pseudo-absences increases (in a regular or uni-form random arrangement), logistic regression slope parameters andtheir standard errors converge to those of the corresponding Poissonpoint process model. We discuss the practical implications of theseresults. In particular, point process modeling offers a framework forchoice of the number and location of pseudo-absences, both of whichare currently chosen by ad hoc and sometimes ineffective methods inecology, a point which we illustrate by example.

[1]  T. Hastie,et al.  Presence‐Only Data and the EM Algorithm , 2009, Biometrics.

[2]  J. Elith,et al.  Species Distribution Models: Ecological Explanation and Prediction Across Space and Time , 2009 .

[3]  Steven J. Phillips,et al.  Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. , 2009, Ecological applications : a publication of the Ecological Society of America.

[4]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[5]  Rosa M. Chefaoui,et al.  Assessing the effects of pseudo-absences on predictive distribution model performance , 2008 .

[6]  P. Hernandez,et al.  Predicting species distributions in poorly-studied landscapes , 2008, Biodiversity and Conservation.

[7]  G. Moisen,et al.  Habitat classification modeling with incomplete data: pushing the habitat envelope. , 2007, Ecological applications : a publication of the Ecological Society of America.

[8]  Jane Elith,et al.  Predicting species distributions from museum and herbarium records using multiresponse models fitted with multivariate adaptive regression splines , 2007 .

[9]  Art B. Owen,et al.  Infinitely Imbalanced Logistic Regression , 2007, J. Mach. Learn. Res..

[10]  J. Elith,et al.  Sensitivity of predictive species distribution models to change in grain size , 2007 .

[11]  Mark S. Boyce,et al.  Modelling distribution and abundance with presence‐only data , 2006 .

[12]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[13]  Jürgen Symanzik,et al.  Statistical Analysis of Spatial Point Patterns , 2005, Technometrics.

[14]  Adrian Baddeley,et al.  spatstat: An R Package for Analyzing Spatial Point Patterns , 2005 .

[15]  J. Symanzik Statistical Analysis of Spatial Point Patterns (2nd ed.) , 2005 .

[16]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[17]  A. Baddeley,et al.  Non‐ and semi‐parametric estimation of interaction in inhomogeneous point patterns , 2000 .

[18]  David R. Anderson,et al.  Model selection and inference : a practical information-theoretic approach , 2000 .

[19]  Stefan Sperlich,et al.  Generalized Additive Models , 2014 .

[20]  David R. Anderson,et al.  Model Selection and Inference: A Practical Information-Theoretic Approach , 2001 .

[21]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[22]  A. Baddeley,et al.  Area-interaction point processes , 1993 .

[23]  Mark Berman,et al.  Approximating Point Process Likelihoods with Glim , 1992 .

[24]  Mike P. Austin,et al.  Continuum Concept, Ordination Methods, and Niche Theory , 1985 .

[25]  G. Lepage A new algorithm for adaptive multidimensional integration , 1978 .