Optimizing ensembles of small models for predicting the distribution of species with few occurrences

Handling Editor: Nick Isaac Abstract 1. Ensembles of Small Models (ESM) represent a novel strategy for species distribution modelling with few observations. ESMs are built by calibrating many small models and then averaging them into an ensemble model where the small models are weighted by their cross-validated scores of predictive performance. In a previous paper (Breiner, Guisan, Bergamini, & Nobis, Methods in Ecology and Evolution, 6, 1210–1218, 2015), we reported two major findings. First, ESMs proved largely superior to standard models in terms of model performance and transferability. Second, ESMs including different modelling techniques did not clearly improve model performance compared to single-technique ESMs. However, ESMs often require a large computation effort, which can become problematic when modelling large numbers of species. Given the appealing new perspectives offered by ESMs, it is especially important to investigate if some techniques yield increased performance while saving computation time and thus could be predominantly used for building ESMs. 2. Here, we present results from a reanalysis of a subset of the data used in Breiner et al. (2015). More specifically, we ran ESMs: (1) fitted with 10 modelling techniques separately (in Breiner et al., 2015 we used only three techniques); and (2) using various parameter options for each modelling technique (i.e., model tuning). 3. We show that ESMs vary in model performance and computation time across techniques, and some techniques are advantageous in terms of optimizing model performance and computation time (i.e., GLM, CTA and ANN). Including one of these modelling techniques could thus optimize computation time compared to using more computing-intensive techniques like GBM. Next, we show that parameter tuning can improve performance and transferability of ESMs, but often at the cost of computation time. Parameter tuning could therefore be used when computing resources are not a limiting factor. 4. These findings help improve the applicability and performance of ESMs when applied to large numbers of species.

[1]  Steven J. Phillips,et al.  The art of modelling range‐shifting species , 2010 .

[2]  A. Hirzel,et al.  Evaluating the ability of habitat suitability models to predict species presences , 2006 .

[3]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[4]  A. Peterson,et al.  No silver bullets in correlative ecological niche modelling: insights from testing among many potential algorithms for niche estimation , 2015 .

[5]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[6]  Antoine Guisan,et al.  Overcoming the rare species modelling paradox: a novel hierarchical framework applied to an Iberian endemic plant. , 2010 .

[7]  Carsten F. Dormann,et al.  Computing AIC for black-box models using generalized degrees of freedom: A comparison with cross-validation , 2016, Commun. Stat. Simul. Comput..

[8]  Robert P. Anderson,et al.  Estimating optimal complexity for ecological niche models: A jackknife approach for species with small sample sizes , 2013 .

[9]  P. Jurka Timothy,et al.  maxent: An R Package for Low-memory Multinomial Logistic Regression with Support for Semi-automated Text Classification , 2012, R J..

[10]  Trevor H. Booth,et al.  bioclim: the first species distribution modelling package, its early applications and relevance to most current MaxEnt studies , 2014 .

[11]  J. Busby BIOCLIM - a bioclimate analysis and prediction system , 1991 .

[12]  Antoine Guisan,et al.  ecospat: an R package to support spatial analyses and modeling of species niches and distributions , 2017 .

[13]  A. Guisan,et al.  Predicting richness and composition in mountain insect communities at high resolution: a new test of the SESAM framework , 2015 .

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Antoine Guisan,et al.  Measuring the relative effect of factors affecting species distribution model predictions , 2014 .

[16]  Rubén G. Mateo,et al.  Impact of model complexity on cross-temporal transferability in Maxent species distribution models: An assessment using paleobotanical data , 2015 .

[17]  R. Newcombe Two-sided confidence intervals for the single proportion: comparison of seven methods. , 1998, Statistics in medicine.

[18]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[19]  M. Araújo,et al.  BIOMOD – a platform for ensemble forecasting of species distributions , 2009 .

[20]  R. A. Garcia,et al.  Conservation implications of omitting narrow‐ranging taxa from species distribution models, now and in the future , 2014 .

[21]  Sophia Ananiadou,et al.  Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty , 2009, ACL.

[22]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[23]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[24]  Jane Elith,et al.  The evaluation strip: A new and robust method for plotting predicted responses from species distribution models , 2005 .

[25]  A. Peterson,et al.  Effects of sample size on the performance of species distribution models , 2008 .

[26]  R. Pearson,et al.  Predicting species distributions from small numbers of occurrence records: A test case using cryptic geckos in Madagascar , 2006 .

[27]  Eve McDonald-Madden,et al.  Predicting species distributions for conservation decisions , 2013, Ecology letters.

[28]  A. Townsend Peterson,et al.  Rethinking receiver operating characteristic analysis applications in ecological niche modeling , 2008 .

[29]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[30]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[31]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[32]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[33]  Dan L Warren,et al.  Ecological niche modeling in Maxent: the importance of model complexity and the performance of model selection criteria. , 2011, Ecological applications : a publication of the Ecological Society of America.

[34]  Antoine Guisan,et al.  Overcoming limitations of modelling rare species by using ensembles of small models , 2015 .

[35]  Steven J. Phillips,et al.  WHAT MATTERS FOR PREDICTING THE OCCURRENCES OF TREES: TECHNIQUES, DATA, OR SPECIES' CHARACTERISTICS? , 2007 .

[36]  Robert A. Boria,et al.  ENMeval: An R package for conducting spatially independent evaluations and estimating optimal model complexity for Maxent ecological niche models , 2014 .