POC plots: calibrating species distribution models with presence-only data.

Statistical models are widely used for predicting species' geographic distributions and for analyzing species' responses to climatic and other predictor variables. Their predictive performance can be characterized in two complementary ways: discrimination, the ability to distinguish between occupied and unoccupied sites, and calibration, the extent to which a model correctly predicts conditional probability of presence. The most common measures of model performance, such as the area under the receiver operating characteristic curve (AUC), measure only discrimination. In contrast, we introduce a new tool for measuring model calibration: the presence-only calibration plot, or POC plot. This tool relies on presence-only evaluation data, which are more widely available than presence-absence evaluation data, to determine whether predictions are proportional to conditional probability of presence. We generalize the predicted/expected curves of Hirzel et al. to produce a presence-only analogue of traditional (presence-absence) calibration curves. POC plots facilitate visual exploration of model calibration, and can be used to recalibrate badly calibrated models. We demonstrate their use by recalibrating models made by the DOMAIN modeling method on a comprehensive set of 226 species from six regions of the world, significantly improving DOMAIN's predictive performance.

[1]  A. Hirzel,et al.  Evaluating the ability of habitat suitability models to predict species presences , 2006 .

[2]  Simon Ferrier,et al.  Evaluating the predictive performance of habitat models developed using logistic regression , 2000 .

[3]  A. H. Murphy,et al.  Diagnostic verification of probability forecasts , 1992 .

[4]  M. Boyce,et al.  Evaluating resource selection functions , 2002 .

[5]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[6]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[7]  F. T. Wright,et al.  Order restricted statistical inference , 1988 .

[8]  G. Carpenter,et al.  DOMAIN: a flexible modelling procedure for mapping potential distributions of plants and animals , 1993, Biodiversity & Conservation.

[9]  D. Cox Two further applications of a model for binary regression , 1958 .

[10]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[11]  Stefan Heinänen,et al.  Modelling species distribution in complex environments: an evaluation of predictive ability and reliability in five shorebird species , 2009 .

[12]  Steven J. Phillips,et al.  Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. , 2009, Ecological applications : a publication of the Ecological Society of America.

[13]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[14]  B. Reineking,et al.  Constrain to perform: Regularization of habitat models , 2006 .

[15]  W. Ponder,et al.  Evaluation of Museum Collection Data for Use in Biodiversity Assessment , 2001 .

[16]  U. Grenander On the theory of mortality measurement , 1956 .

[17]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[18]  D. Borras,et al.  Modelling the distributions and spatial coincidence of bluetongue vectors Culicoides imicola and the Culicoides obsoletus group throughout the Iberian peninsula , 2008, Medical and veterinary entomology.

[19]  T. Hastie,et al.  Presence‐Only Data and the EM Algorithm , 2009, Biometrics.

[20]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[21]  Miha Vuk,et al.  ROC curve, lift chart and calibration plot , 2006, Advances in Methodology and Statistics.

[22]  John Bell,et al.  A review of methods for the assessment of prediction errors in conservation presence/absence models , 1997, Environmental Conservation.