On the selection of thresholds for predicting species occurrence with presence‐only data

Abstract Presence‐only data present challenges for selecting thresholds to transform species distribution modeling results into binary outputs. In this article, we compare two recently published threshold selection methods (maxSSS and maxF pb) and examine the effectiveness of the threshold‐based prevalence estimation approach. Six virtual species with varying prevalence were simulated within a real landscape in southeastern Australia. Presence‐only models were built with DOMAIN, generalized linear model, Maxent, and Random Forest. Thresholds were selected with two methods maxSSS and maxF pb with four presence‐only datasets with different ratios of the number of known presences to the number of random points (KP–RP ratio). Sensitivity, specificity, true skill statistic, and F measure were used to evaluate the performance of the results. Species prevalence was estimated as the ratio of the number of predicted presences to the total number of points in the evaluation dataset. Thresholds selected with maxF pb varied as the KP–RP ratio of the threshold selection datasets changed. Datasets with the KP–RP ratio around 1 generally produced better results than scores distant from 1. Results produced by We conclude that maxFpb had specificity too low for very common species using Random Forest and Maxent models. In contrast, maxSSS produced consistent results whichever dataset was used. The estimation of prevalence was almost always biased, and the bias was very large for DOMAIN and Random Forest predictions. We conclude that maxF pb is affected by the KP–RP ratio of the threshold selection datasets, but maxSSS is almost unaffected by this ratio. Unbiased estimations of prevalence are difficult to be determined using the threshold‐based approach.

[1]  Graeme Newell,et al.  Species distribution modelling for conservation planning in Victoria, Australia , 2013 .

[2]  Miroslav Dudík,et al.  Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation , 2008 .

[3]  Mark S Boyce,et al.  Selection, use, choice and occupancy: clarifying concepts in resource selection studies. , 2013, The Journal of animal ecology.

[4]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[5]  R. Pearson,et al.  Predicting species distributions from small numbers of occurrence records: A test case using cryptic geckos in Madagascar , 2006 .

[6]  J. Lobo,et al.  Threshold criteria for conversion of probability of species presence to either–or presence–absence , 2007 .

[7]  David A. Nipperess,et al.  Freshwater conservation planning under climate change:demonstrating proactive approaches for Australian Odonata , 2014 .

[8]  T. Dawson,et al.  Modelling species distributions in Britain: a hierarchical integration of climate and land-cover data , 2004 .

[9]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[10]  T. Dawson,et al.  SPECIES: A Spatial Evaluation of Climate Impact on the Envelope of Species , 2002 .

[11]  J. Brashares,et al.  The effects of small sample size and sample bias on threshold selection and accuracy assessment of species distribution models , 2012 .

[12]  J. Busby BIOCLIM - a bioclimate analysis and prediction system , 1991 .

[13]  Brendan A. Wintle,et al.  Is my species distribution model fit for purpose? Matching data and models to applications , 2015 .

[14]  J. Drake,et al.  Modelling ecological niches with support vector machines , 2006 .

[15]  Q. Guo,et al.  How to assess the prediction accuracy of species presence–absence models without absence data? , 2013 .

[16]  M. White,et al.  Measuring and comparing the accuracy of species distribution models with presence–absence data , 2011 .

[17]  Gretchen G. Moisen,et al.  A comparison of the performance of threshold criteria for binary classification in terms of predicted prevalence and Kappa , 2008 .

[18]  G. Carpenter,et al.  DOMAIN: a flexible modelling procedure for mapping potential distributions of plants and animals , 1993, Biodiversity & Conservation.

[19]  P. Leitão,et al.  Mapping seasonal European bison habitat in the Caucasus Mountains to identify potential reintroduction sites , 2015 .

[20]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[21]  Huijie Qiao,et al.  Niche breadth and geographic range size as determinants of species survival on geological time scales , 2015 .

[22]  T. Done,et al.  Limited scope for latitudinal extension of reef corals , 2015, Science.

[23]  M. White,et al.  Selecting thresholds for the prediction of species occurrence with presence‐only data , 2013 .

[24]  David R. B. Stockwell,et al.  The GARP modelling system: problems and solutions to automated spatial prediction , 1999, Int. J. Geogr. Inf. Sci..

[25]  D. Macdonald,et al.  Using Landscape and Bioclimatic Features to Predict the Distribution of Lions, Leopards and Spotted Hyaenas in Tanzania's Ruaha Landscape , 2014, PloS one.

[26]  M. Whitea,et al.  Species distribution modelling for conservation planning in Victoria of Australia , 2011 .

[27]  D. Hilbert,et al.  LIVES: a new habitat modelling technique for predicting the distribution of species’ occurrences using presence-only data based on limiting factor theory , 2008, Biodiversity and Conservation.

[28]  Sam Veloz,et al.  Spatially autocorrelated sampling falsely inflates measures of accuracy for presence‐only niche models , 2009 .

[29]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[30]  T. Dawson,et al.  Selecting thresholds of occurrence in the prediction of species distributions , 2005 .

[31]  M. Araújo,et al.  Five (or so) challenges for species distribution modelling , 2006 .

[32]  D. Chessel,et al.  ECOLOGICAL-NICHE FACTOR ANALYSIS: HOW TO COMPUTE HABITAT-SUITABILITY MAPS WITHOUT ABSENCE DATA? , 2002 .

[33]  Per Capita,et al.  About the authors , 1995, Machine Vision and Applications.

[34]  Matthew J. Smith,et al.  Using species distribution models to inform IUCN Red List assessments , 2014 .

[35]  Maggi Kelly,et al.  Support vector machines for predicting distribution of Sudden Oak Death in California , 2005 .

[36]  J. Aguirre‐Gutiérrez,et al.  Ecological Effects of the Invasive Giant Madagascar Day Gecko on Endemic Mauritian Geckos: Applications of Binomial-Mixture and Species Distribution Models , 2014, PloS one.

[37]  Avishek Chakraborty,et al.  Point pattern modelling for degraded presence‐only data over large regions , 2011 .

[38]  Jane Elith,et al.  On estimating probability of presence from use-availability or presence-background data. , 2013, Ecology.

[39]  J. Fieberg,et al.  Comparative interpretation of count, presence–absence and point methods for species distribution models , 2012 .

[40]  F. Jiguet,et al.  Selecting pseudo‐absences for species distribution models: how, where and how many? , 2012 .

[41]  John Bell,et al.  A review of methods for the assessment of prediction errors in conservation presence/absence models , 1997, Environmental Conservation.

[42]  M. Araújo,et al.  Choice of threshold alters projections of species range shifts under climate change , 2011 .

[43]  C. Meynard,et al.  Using virtual species to study species distributions and model performance , 2013 .

[44]  Trevor Hastie,et al.  Inference from presence-only data; the ongoing controversy. , 2013, Ecography.

[45]  D. Warton,et al.  Correction note: Poisson point process models solve the “pseudo-absence problem” for presence-only data in ecology , 2010, 1011.3319.