Selecting thresholds of occurrence in the prediction of species distributions

Transforming the results of species distribution modelling from probabilities of or suitabilities for species occurrence to presences/absences needs a specific threshold. Even though there are many approaches to determining thresholds, there is no comparative study. In this paper, twelve approaches were compared using two species in Europe and artificial neural networks, and the modelling results were assessed using four indices: sensitivity, specificity, overall prediction success and Cohen's kappa statistic. The results show that prevalence approach, average predicted probability/suitability approach, and three sensitivity-specificity-combined approaches, including sensitivity-specificity sum maximization approach, sensitivity-specificity equality approach and the approach based on the shortest distance to the top-left corner (0,1) in ROC plot, are the good ones. The commonly used kappa maximization approach is not as good as the afore-mentioned ones, and the fixed threshold approach is the worst one. We also recommend using datasets with prevalence of 50% to build models if possible since most optimization criteria might be satisfied or nearly satisfied at the same time, and therefore it's easier to find optimal thresholds in this situation.

[1]  C. Metz Basic principles of ROC analysis. , 1978, Seminars in nuclear medicine.

[2]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[3]  Y. Hung,et al.  Use of artificial neural networks , 1995 .

[4]  A. Fielding,et al.  Testing the Generality of Bird‐Habitat Models , 1995 .

[5]  W. Cramer,et al.  Special Paper: Modelling Present and Potential Future Ranges of Some European Higher Plants Using Climate Response Surfaces , 1995 .

[6]  John Bell,et al.  A review of methods for the assessment of prediction errors in conservation presence/absence models , 1997, Environmental Conservation.

[7]  F. Kienast,et al.  Predicting the potential distribution of plant species in an alpine environment , 1998 .

[8]  Yoram Singer,et al.  Boosting and Rocchio applied to text filtering , 1998, SIGIR '98.

[9]  S. Lek,et al.  The use of artificial neural networks to assess fish abundance and spatial occupancy in the littoral zone of a mesotrophic lake , 1999 .

[10]  R. Richards-Kortum,et al.  A comparison of C/B ratios from studies using receiver operating characteristic curve analysis. , 1999, Journal of clinical epidemiology.

[11]  Uygar Özesmi,et al.  An artificial neural network approach to spatial habitat modelling with interspecific interaction , 1999 .

[12]  S. Manel,et al.  Comparing discriminant analysis, neural networks and logistic regression for predicting species distributions: a case study with a Himalayan river bird , 1999 .

[13]  Simon Ferrier,et al.  Evaluating the predictive performance of habitat models developed using logistic regression , 2000 .

[14]  Antoine Guisan,et al.  Predictive habitat distribution models in ecology , 2000 .

[15]  Un Yong Nahm and Raymond J. Mooney,et al.  Using Information Extraction to Aid the Discovery of Prediction Rules from Text , 2000 .

[16]  B. Huntley,et al.  Predicting the spatial distribution of non‐indigenous riparian weeds: issues of spatial scale and extent , 2000 .

[17]  A. Guisan,et al.  Equilibrium modeling of alpine plant distribution: how far can we go? , 2000 .

[18]  G. Cumming Using habitat models to map diversity: pan‐African species richness of ticks (Acari: Ixodida) , 2000 .

[19]  S. Manel,et al.  Evaluating presence-absence models in ecology: the need to account for prevalence , 2001 .

[20]  Donald A. Jackson,et al.  Fish–Habitat Relationships in Lakes: Gaining Predictive and Explanatory Insight by Using Artificial Neural Networks , 2001 .

[21]  M. Robertson,et al.  A PCA‐based modelling technique for predicting environmental suitability for organisms from presence records , 2001 .

[22]  B. Ostendorf,et al.  The utility of artificial neural networks for modelling the distribution of vegetation in past, present and future climates , 2001 .

[23]  Dr Robert Bryant,et al.  Modelling landscape-scale habitat use using GIS and remote sensing : a case study with great bustards , 2001 .

[24]  T. Dawson,et al.  Modelling potential impacts of climate change on the bioclimatic envelope of species in Britain and Ireland , 2002 .

[25]  A. Woolf,et al.  Statewide modeling of bobcat, Lynx rufus, habitat in Illinois, USA , 2002 .

[26]  Trevor Hastie,et al.  Generalized linear and generalized additive models in studies of species distributions: setting the scene , 2002 .

[27]  G. Luck The habitat requirements of the rufous treecreeper (Climacteris rufa). 2. Validating predictive habitat models , 2002 .

[28]  David R. B. Stockwell,et al.  Effects of sample size on accuracy of species distribution models , 2002 .

[29]  T. Dawson,et al.  SPECIES: A Spatial Evaluation of Climate Impact on the Envelope of Species , 2002 .

[30]  R. Haines-Young,et al.  Species presence in fragmented landscapes: modelling of species requirements at the national level , 2002 .

[31]  M. Austin Spatial prediction of species distribution: an interface between ecological theory and statistical modelling , 2002 .

[32]  Gretchen G. Moisen,et al.  Comparing five modelling techniques for predicting forest characteristics , 2002 .

[33]  P. Goethals,et al.  Use of genetic algorithms to select input variables in decision tree models for the prediction of benthic macroinvertebrates , 2003 .

[34]  J. Olden A Species‐Specific Approach to Modeling Biological Communities and Its Potential for Conservation , 2003 .

[35]  M. Robertson,et al.  Comparing models for predicting species’ potential distributions: a case study using correlative and mechanistic predictive modelling techniques , 2003 .

[36]  J. S. Cramer,et al.  Logit Models from Economics and Other Fields , 2003 .

[37]  T. Dawson,et al.  Modelling species distributions in Britain: a hierarchical integration of climate and land-cover data , 2004 .

[38]  S. Weiss,et al.  GLM versus CCA spatial modeling of plant species distribution , 1999, Plant Ecology.

[39]  A. Lehmann GIS modeling of submerged macrophyte distribution using Generalized Additive Models , 1998, Plant Ecology.

[40]  Åke Berg,et al.  Logistic regression models for predicting occurrence of terrestrial molluscs in southern Sweden – importance of environmental data quality and model complexity , 2004 .