Finite-Sample Equivalence in Statistical Models for Presence-Only Data.

Statistical modeling of presence-only data has attracted much recent attention in the ecological literature, leading to a proliferation of methods, including the inhomogeneous Poisson process (IPP) model, maximum entropy (Maxent) modeling of species distributions and logistic regression models. Several recent articles have shown the close relationships between these methods. We explain why the IPP intensity function is a more natural object of inference in presence-only studies than occurrence probability (which is only defined with reference to quadrat size), and why presence-only data only allows estimation of relative, and not absolute intensity of species occurrence. All three of the above techniques amount to parametric density estimation under the same exponential family model (in the case of the IPP, the fitted density is multiplied by the number of presence records to obtain a fitted intensity). We show that IPP and Maxent give the exact same estimate for this density, but logistic regression in general yields a different estimate in finite samples. When the model is misspecified-as it practically always is-logistic regression and the IPP may have substantially different asymptotic limits with large data sets. We propose "infinitely weighted logistic regression," which is exactly equivalent to the IPP in finite samples. Consequently, many already-implemented methods extending logistic regression can also extend the Maxent and IPP models in directly analogous ways using this technique.

[1]  Miroslav Dudík,et al.  Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation , 2008 .

[2]  Alastair Scott,et al.  Fitting binary regression models with case-augmented samples , 2006 .

[3]  N. Fisher,et al.  Spatial logistic regression and change-of-support in Poisson point processes , 2010 .

[4]  D. MacKenzie Occupancy Estimation and Modeling: Inferring Patterns and Dynamics of Species Occurrence , 2005 .

[5]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[6]  Trevor Hastie,et al.  A statistical explanation of MaxEnt for ecologists , 2011 .

[7]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[8]  A. Baddeley,et al.  Practical Maximum Pseudolikelihood for Spatial Point Patterns , 1998, Advances in Applied Probability.

[9]  D. Warton,et al.  Equivalence of MAXENT and Poisson Point Process Models for Species Distribution Modeling in Ecology , 2013, Biometrics.

[10]  Steven J. Phillips,et al.  Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. , 2009, Ecological applications : a publication of the Ecological Society of America.

[11]  Mark Berman,et al.  Approximating Point Process Likelihoods with Glim , 1992 .

[12]  J. Andrew Royle,et al.  Modelling occurrence and abundance of species when detection is imperfect , 2005 .

[13]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[14]  J. Andrew Royle,et al.  Likelihood analysis of species occurrence probability from presence‐only data for modelling species distributions , 2012, Methods in Ecology and Evolution.

[15]  Robert M Dorazio,et al.  Predicting the Geographic Distribution of a Species from Presence‐Only Data Subject to Detection Errors , 2012, Biometrics.

[16]  Subhash R Lele,et al.  Weighted distributions and estimation of resource selection probability functions. , 2006, Ecology.

[17]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[18]  Avishek Chakraborty,et al.  Point pattern modelling for degraded presence‐only data over large regions , 2011 .

[19]  Noel A Cressie,et al.  Statistics for Spatial Data, Revised Edition. , 1994 .

[20]  Miroslav Dudík,et al.  A maximum entropy approach to species distribution modeling , 2004, ICML.

[21]  C. Margules,et al.  Biological Models for Monitoring Species Decline: The Construction and Use of Data Bases , 1994 .

[22]  C. Manski,et al.  The Logit Model and Response-Based Samples , 1989 .

[23]  Carlo Gaetan,et al.  Spatial Statistics and Modeling , 2009 .

[24]  Jane Elith,et al.  On estimating probability of presence from use-availability or presence-background data. , 2013, Ecology.

[25]  J. Fieberg,et al.  Comparative interpretation of count, presence–absence and point methods for species distribution models , 2012 .

[26]  Chris J. Johnson,et al.  Resource Selection Functions Based on Use–Availability Data: Theoretical Motivation and Evaluation Methods , 2006 .

[27]  T. Hastie,et al.  Presence‐Only Data and the EM Algorithm , 2009, Biometrics.

[28]  D. Warton,et al.  Correction note: Poisson point process models solve the “pseudo-absence problem” for presence-only data in ecology , 2010, 1011.3319.