Likelihood analysis of species occurrence probability from presence‐only data for modelling species distributions

Summary 1. Understanding the factors affecting species occurrence is a pre-eminent focus of applied ecological research. However, direct information about species occurrence is lacking for many species. Instead, researchers sometimes have to rely on so-called presence-only data (i.e. when no direct information about absences is available), which often results from opportunistic, unstructured sampling. maxent is a widely used software program designed to model and map species distribution using presence-only data. 2. We provide a critical review of maxent as applied to species distribution modelling and discuss how it can lead to inferential errors. A chief concern is that maxent produces a number of poorly defined indices that are not directly related to the actual parameter of interest – the probability of occurrence (ψ). This focus on an index was motivated by the belief that it is not possible to estimate ψ from presence-only data; however, we demonstrate that ψ is identifiable using conventional likelihood methods under the assumptions of random sampling and constant probability of species detection. 3. The model is implemented in a convenient r package which we use to apply the model to simulated data and data from the North American Breeding Bird Survey. We demonstrate that maxent produces extreme under-predictions when compared to estimates produced by logistic regression which uses the full (presence/absence) data set. We note that maxent predictions are extremely sensitive to specification of the background prevalence, which is not objectively estimated using the maxent method. 4. As with maxent, formal model-based inference requires a random sample of presence locations. Many presence-only data sets, such as those based on museum records and herbarium collections, may not satisfy this assumption. However, when sampling is random, we believe that inference should be based on formal methods that facilitate inference about interpretable ecological quantities instead of vaguely defined indices.

[1]  B. Manly,et al.  Resource selection by animals: statistical design and analysis for field studies. , 1994 .

[2]  Subhash R Lele,et al.  Weighted distributions and estimation of resource selection probability functions. , 2006, Ecology.

[3]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[4]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[5]  B. Silverman,et al.  Nonparametric regression and generalized linear models , 1994 .

[6]  B. Silverman,et al.  Nonparametric Regression and Generalized Linear Models: A roughness penalty approach , 1993 .

[7]  Steven J. Phillips,et al.  The art of modelling range‐shifting species , 2010 .

[8]  G. Imbens,et al.  Case-control studies with contaminated controls☆ , 1996 .

[9]  J Andrew Royle,et al.  Generalized site occupancy models allowing for false positive and false negative errors. , 2006, Ecology.

[10]  Miroslav Dudík,et al.  Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation , 2008 .

[11]  M. Kéry,et al.  Predicting species distributions from checklist data using site‐occupancy models , 2010 .

[12]  J. Andrew Royle,et al.  ESTIMATING ABUNDANCE FROM REPEATED PRESENCE–ABSENCE DATA OR POINT COUNTS , 2003 .

[13]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[14]  J. Andrew Royle,et al.  ESTIMATING SITE OCCUPANCY RATES WHEN DETECTION PROBABILITIES ARE LESS THAN ONE , 2002, Ecology.

[15]  Miroslav Dudík,et al.  Performance Guarantees for Regularized Maximum Entropy Density Estimation , 2004, COLT.

[16]  E. Jaynes Probability theory : the logic of science , 2003 .

[17]  David A. W. Miller,et al.  Improving occupancy estimation when two types of observational error occur: non-detection and species misidentification. , 2011, Ecology.

[18]  H. Possingham,et al.  IMPROVING PRECISION AND REDUCING BIAS IN BIOLOGICAL SURVEYS: ESTIMATING FALSE‐NEGATIVE ERROR RATES , 2003 .

[19]  Miroslav Dudík,et al.  A maximum entropy approach to species distribution modeling , 2004, ICML.

[20]  T. Hastie,et al.  Presence‐Only Data and the EM Algorithm , 2009, Biometrics.

[21]  Trevor Hastie,et al.  A statistical explanation of MaxEnt for ecologists , 2011 .

[22]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[23]  Marc Kéry,et al.  Towards the modelling of true species distributions , 2011 .