Fit-for-Purpose: Species Distribution Model Performance Depends on Evaluation Criteria – Dutch Hoverflies as a Case Study

Understanding species distributions and the factors limiting them is an important topic in ecology and conservation, including in nature reserve selection and predicting climate change impacts. While Species Distribution Models (SDM) are the main tool used for these purposes, choosing the best SDM algorithm is not straightforward as these are plentiful and can be applied in many different ways. SDM are used mainly to gain insight in 1) overall species distributions, 2) their past-present-future probability of occurrence and/or 3) to understand their ecological niche limits (also referred to as ecological niche modelling). The fact that these three aims may require different models and outputs is, however, rarely considered and has not been evaluated consistently. Here we use data from a systematically sampled set of species occurrences to specifically test the performance of Species Distribution Models across several commonly used algorithms. Species range in distribution patterns from rare to common and from local to widespread. We compare overall model fit (representing species distribution), the accuracy of the predictions at multiple spatial scales, and the consistency in selection of environmental correlations all across multiple modelling runs. As expected, the choice of modelling algorithm determines model outcome. However, model quality depends not only on the algorithm, but also on the measure of model fit used and the scale at which it is used. Although model fit was higher for the consensus approach and Maxent, Maxent and GAM models were more consistent in estimating local occurrence, while RF and GBM showed higher consistency in environmental variables selection. Model outcomes diverged more for narrowly distributed species than for widespread species. We suggest that matching study aims with modelling approach is essential in Species Distribution Models, and provide suggestions how to do this for different modelling aims and species’ data characteristics (i.e. sample size, spatial distribution).

[1]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[2]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[3]  Anne Lohrli Chapman and Hall , 1985 .

[4]  D. Padilla,et al.  Ecological neighborhoods: scaling environmental patterns , 1987 .

[5]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[6]  J. Wiens Spatial Scaling in Ecology , 1989 .

[7]  J. Thomson,et al.  In defense of , 1990 .

[8]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[9]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[10]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[11]  W. Ponder,et al.  Evaluation of Museum Collection Data for Use in Biodiversity Assessment , 2001 .

[12]  S. Díaz,et al.  Vive la différence: plant functional diversity matters to ecosystem processes , 2001 .

[13]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Hanna Tuomisto,et al.  DISSECTING THE SPATIAL STRUCTURE OF ECOLOGICAL DATA AT MULTIPLE SCALES , 2004 .

[16]  W. Thuiller,et al.  Predicting species distribution: offering more than simple habitat models. , 2005, Ecology letters.

[17]  J. L. Parra,et al.  Very high resolution interpolated climate surfaces for global land areas , 2005 .

[18]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[19]  M. Araújo,et al.  Consequences of spatial autocorrelation for niche‐based models , 2006 .

[20]  Hans Visser,et al.  The Map Comparison Kit , 2006, Environ. Model. Softw..

[21]  T. Dawson,et al.  Model‐based uncertainty in species range prediction , 2006 .

[22]  Alex Hagen-Zanker,et al.  Map comparison methods that simultaneously address overlap and structure , 2006, J. Geogr. Syst..

[23]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[24]  M. Araújo,et al.  Five (or so) challenges for species distribution modelling , 2006 .

[25]  D. White,et al.  Predicting climate‐induced range shifts: model differences and model reliability , 2006 .

[26]  J. Lobo,et al.  Threshold criteria for conversion of probability of species presence to either–or presence–absence , 2007 .

[27]  Mark New,et al.  Ensemble forecasting of species distributions. , 2007, Trends in ecology & evolution.

[28]  N. Raes,et al.  A null‐model for significance testing of presence‐only species distribution models , 2007 .

[29]  M. Austin Species distribution models and ecological theory: A critical assessment and some possible new approaches , 2007 .

[30]  M. B. Garzón,et al.  Effects of climate change on the distribution of Iberian tree species , 2008 .

[31]  Miroslav Dudík,et al.  Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation , 2008 .

[32]  A. Gimona,et al.  Opening the climate envelope reveals no macroscale associations with climate in European birds , 2008, Proceedings of the National Academy of Sciences.

[33]  D. M. Titterington,et al.  Comment on “On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes” , 2008, Neural Processing Letters.

[34]  A. Peterson,et al.  Effects of sample size on the performance of species distribution models , 2008 .

[35]  J. Elith,et al.  Species Distribution Models: Ecological Explanation and Prediction Across Space and Time , 2009 .

[36]  N. Raes,et al.  Botanical richness and endemicity patterns of Borneo derived from species distribution models , 2009 .

[37]  Jorge Soberón,et al.  Niches and distributional areas: Concepts, methods, and assumptions , 2009, Proceedings of the National Academy of Sciences.

[38]  J. Elith,et al.  Do they? How do they? WHY do they differ? On finding reasons for differing performances of species distribution models , 2009 .

[39]  Steven J. Phillips,et al.  Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. , 2009, Ecological applications : a publication of the Ecological Society of America.

[40]  A. Farcomeni,et al.  Modelling the spatial distribution of tree species with fragmented populations from abundance data , 2009 .

[41]  Mollie E. Brooks,et al.  Generalized linear mixed models: a practical guide for ecology and evolution. , 2009, Trends in ecology & evolution.

[42]  Mathieu Marmion,et al.  Evaluation of consensus methods in predictive species distribution modelling , 2009 .

[43]  M. Araújo,et al.  BIOMOD – a platform for ensemble forecasting of species distributions , 2009 .

[44]  Alex Hagen-Zanker,et al.  An improved Fuzzy Kappa statistic that accounts for spatial autocorrelation , 2009, Int. J. Geogr. Inf. Sci..

[45]  Antoine Guisan,et al.  Overcoming the rare species modelling paradox: a novel hierarchical framework applied to an Iberian endemic plant. , 2010 .

[46]  David D. Ackerly,et al.  Functional trait and phylogenetic tests of community assembly across spatial scales in an Amazonian forest , 2010 .

[47]  C. Graham,et al.  New trends in species distribution modelling , 2010 .

[48]  Bruno Lafourcade,et al.  Presentation Manual for BIOMOD , 2010 .

[49]  Jennifer A. Miller,et al.  Mapping Species Distributions: Spatial Inference and Prediction , 2010 .

[50]  Á. Felicísimo,et al.  Profile or group discriminative techniques? Generating reliable species distribution models using pseudo‐absences and target‐group absences from natural history collections , 2010 .

[51]  S. Lek,et al.  Uncertainty in ensemble forecasting of species distribution , 2010 .

[52]  Yu Liu,et al.  ModEco: an integrated software package for ecological niche modeling , 2010 .

[53]  Sandra Díaz,et al.  Towards an assessment of multiple ecosystem processes and services via functional traits , 2010, Biodiversity and Conservation.

[54]  Trevor Hastie,et al.  A statistical explanation of MaxEnt for ecologists , 2011 .

[55]  Peter Brewer,et al.  openModeller: a generic approach to species’ potential distribution modelling , 2011, GeoInformatica.

[56]  A. Peterson,et al.  The crucial role of the accessible area in ecological niche modeling and species distribution modeling , 2011 .

[57]  L. Maiorano,et al.  Predicting potential distribution of the jaguar (Panthera onca) in Mexico: identification of priority areas for conservation , 2011 .

[58]  Robert P. Anderson,et al.  Ecological Niches and Geographic Distributions , 2011 .

[59]  V. Barve,et al.  Variation in niche and distribution model performance: The need for a priori assessment of key causal factors , 2012 .

[60]  J. Diniz‐Filho,et al.  Two years later: Natureza & Conservação and its impact , 2012 .

[61]  S. Gillings,et al.  Population density but not stability can be predicted from species distribution models , 2012 .

[62]  Elia Axinia Machado-Machado Empirical mapping of suitability to dengue fever in Mexico using species distribution modeling , 2012 .

[63]  J. Brashares,et al.  The effects of small sample size and sample bias on threshold selection and accuracy assessment of species distribution models , 2012 .

[64]  Steven I. Higgins,et al.  A niche for biology in species distribution models , 2012 .

[65]  Greg J. McInerny,et al.  Ditch the niche – is the niche a useful concept in ecology or species distribution modelling? , 2012 .

[66]  M. Veith,et al.  Species distribution models for the alien invasive Asian Harlequin ladybird (Harmonia axyridis) , 2012 .

[67]  Dan L Warren,et al.  In defense of 'niche modeling'. , 2012, Trends in ecology & evolution.

[68]  A. Jiménez‐Valverde Insights into the area under the receiver operating characteristic curve (AUC) as a discrimination measure in species distribution modelling , 2012 .

[69]  B. Otto‐Bliesner,et al.  No‐analog climates and shifting realized niches during the late quaternary: implications for 21st‐century predictions by species distribution models , 2012 .

[70]  F. Jiguet,et al.  Selecting pseudo‐absences for species distribution models: how, where and how many? , 2012 .

[71]  M. Araújo,et al.  Uses and misuses of bioclimatic envelope modeling. , 2012, Ecology.

[72]  A. Peterson,et al.  Species Distribution Modeling and Ecological Niche Modeling: Getting the Concepts Right , 2012 .

[73]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[74]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .