What do we gain from simplicity versus complexity in species distribution models

Species distribution models (SDMs) are widely used to explain and predict species ranges and environmental niches. They are most commonly constructed by inferring species' occurrence–environment relationships using statistical and machine-learning methods. The variety of methods that can be used to construct SDMs (e.g. generalized linear/additive models, tree-based models, maximum entropy, etc.), and the variety of ways that such models can be implemented, permits substantial flexibility in SDM complexity. Building models with an appropriate amount of complexity for the study objectives is critical for robust inference. We characterize complexity as the shape of the inferred occurrence–environment relationships and the number of parameters used to describe them, and search for insights into whether additional complexity is informative or superfluous. By building ‘under fit’ models, having insufficient flexibility to describe observed occurrence–environment relationships, we risk misunderstanding the factors shaping species distributions. By building ‘over fit’ models, with excessive flexibility, we risk inadvertently ascribing pattern to noise or building opaque models. However, model selection can be challenging, especially when comparing models constructed under different modeling approaches. Here we argue for a more pragmatic approach: researchers should constrain the complexity of their models based on study objective, attributes of the data, and an understanding of how these interact with the underlying biological processes. We discuss guidelines for balancing under fitting with over fitting and consequently how complexity affects decisions made during model building. Although some generalities are possible, our discussion reflects differences in opinions that favor simpler versus more complex models. We conclude that combining insights from both simple and complex SDM building approaches best advances our knowledge of current and future species ranges.

[1]  R. Holt Bringing the Hutchinsonian niche into the 21st century: Ecological and evolutionary perspectives , 2009, Proceedings of the National Academy of Sciences.

[2]  Jorge Soberón Grinnellian and Eltonian niches and geographic distributions of species. , 2007, Ecology letters.

[3]  Antoine Guisan,et al.  Are niche-based species distribution models transferable in space? , 2006 .

[4]  Bertrand Inu Renaud,et al.  Compounding financial repression with rigid urban regulations : lessons of the Korea housing market , 1989 .

[5]  Brendan A. Wintle,et al.  Imperfect detection impacts the performance of species distribution models , 2014 .

[6]  R. Real,et al.  AUC: a misleading measure of the performance of predictive distribution models , 2008 .

[7]  A. O. Nicholls,et al.  Determining species response functions to an environmental gradient by means of a β‐function , 1994 .

[8]  Achim Zeileis,et al.  BMC Bioinformatics BioMed Central Methodology article Conditional variable importance for random forests , 2008 .

[9]  Miroslav Dudík,et al.  Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation , 2008 .

[10]  R. G. Davies,et al.  Methods to account for spatial autocorrelation in the analysis of species distributional data : a review , 2007 .

[11]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[12]  A. O. Nicholls,et al.  To fix or not to fix the species limits, that is the ecological question: Response to Jari Oksanen , 1997 .

[13]  Carsten F. Dormann,et al.  Towards novel approaches to modelling biotic interactions in multispecies assemblages at large spatial extents , 2012 .

[14]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[15]  S. Ellner,et al.  SIZE‐SPECIFIC SENSITIVITY: APPLYING A NEW STRUCTURED POPULATION MODEL , 2000 .

[16]  C. Graham,et al.  New trends in species distribution modelling , 2010 .

[17]  Darryl I. MacKenzie,et al.  Designing occupancy studies: general advice and allocating survey effort , 2005 .

[18]  M. Araújo,et al.  BIOMOD – a platform for ensemble forecasting of species distributions , 2009 .

[19]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[20]  A. Hirzel,et al.  Evaluating the ability of habitat suitability models to predict species presences , 2006 .

[21]  D. Lindenmayer,et al.  Fitting and Interpreting Occupancy Models , 2013, PloS one.

[22]  Shanshan Wu,et al.  Building statistical models to analyze species distributions. , 2006, Ecological applications : a publication of the Ecological Society of America.

[23]  Wilfried Thuiller,et al.  Accounting for dispersal and biotic interactions to disentangle the drivers of species distributions and their abundances. , 2012, Ecology letters.

[24]  Jane Elith,et al.  The evaluation strip: A new and robust method for plotting predicted responses from species distribution models , 2005 .

[25]  M. Austin Spatial prediction of species distribution: an interface between ecological theory and statistical modelling , 2002 .

[26]  Steven J. Phillips,et al.  The art of modelling range‐shifting species , 2010 .

[27]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[28]  Wilfried Thuiller,et al.  A road map for integrating eco-evolutionary processes into biodiversity models. , 2013, Ecology letters.

[29]  T. Dawson,et al.  Model‐based uncertainty in species range prediction , 2006 .

[30]  N. Raes,et al.  A null‐model for significance testing of presence‐only species distribution models , 2007 .

[31]  J. Andrew Royle,et al.  ESTIMATING SITE OCCUPANCY RATES WHEN DETECTION PROBABILITIES ARE LESS THAN ONE , 2002, Ecology.

[32]  A. Krueger The Political Economy of the Rent-Seeking Society , 1974 .

[33]  Antoine Guisan,et al.  Climatic extremes improve predictions of spatial patterns of tree species , 2009, Proceedings of the National Academy of Sciences.

[34]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[35]  Colin M Beale,et al.  Regression analysis of spatial data. , 2010, Ecology letters.

[36]  D. R. Cutler,et al.  Effects of sample survey design on the accuracy of classification tree models in species distribution models , 2006 .

[37]  W. Thuiller,et al.  Predicting species distribution: offering more than simple habitat models. , 2005, Ecology letters.

[38]  Wilfried Thuiller,et al.  Anticipating the spatio-temporal response of plant diversity and vegetation structure to climate and land use change in a protected area. , 2014, Ecography.

[39]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[40]  J. Fieberg,et al.  Comparative interpretation of count, presence–absence and point methods for species distribution models , 2012 .

[41]  Sam Veloz,et al.  Spatially autocorrelated sampling falsely inflates measures of accuracy for presence‐only niche models , 2009 .

[42]  Simon N. Wood,et al.  Generalized Additive Models , 2006, Annual Review of Statistics and Its Application.

[43]  Jason Wittenberg,et al.  Clarify: Software for Interpreting and Presenting Statistical Results , 2003 .

[44]  Mike Lonergan,et al.  Data availability constrains model complexity, generality, and utility: a response to Evans et al. , 2014, Trends in ecology & evolution.

[45]  Damaris Zurell,et al.  Does probability of occurrence relate to population dynamics? , 2014, Ecography.

[46]  Robert P. Anderson,et al.  Ecological Niches and Geographic Distributions , 2011 .

[47]  M. Araújo,et al.  Uses and misuses of bioclimatic envelope modeling. , 2012, Ecology.

[48]  S. Richards,et al.  Prevalence, thresholds and the performance of presence–absence models , 2014 .

[49]  Maureen A. O’Malley,et al.  Do simple models lead to generality in ecology? , 2013, Trends in ecology & evolution.

[50]  Steven J. Phillips,et al.  Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. , 2009, Ecological applications : a publication of the Ecological Society of America.

[51]  U. Grömping Dependence of Variable Importance in Random Forests on the Shape of the Regressor Space , 2009 .

[52]  M. Hutchinson,et al.  The effect of species response form on species distribution model prediction and inference , 2009 .

[53]  J. Fox Effect Displays in R for Generalised Linear Models , 2003 .

[54]  S. Lavorel,et al.  Effects of restricting environmental range of data to project current and future species distributions , 2004 .

[55]  M. Araújo,et al.  Five (or so) challenges for species distribution modelling , 2006 .

[56]  H. Maclean,et al.  Does including physiology improve species distribution model predictions of responses to recent climate change? , 2011, Ecology.

[57]  Simon Ferrier,et al.  Evaluating the predictive performance of habitat models developed using logistic regression , 2000 .

[58]  Lucas Janson,et al.  Effective degrees of freedom: a flawed metaphor. , 2013, Biometrika.

[59]  H. T. Schreuder,et al.  For What Applications Can Probability and Non-Probability Sampling Be Used? , 2001, Environmental monitoring and assessment.

[60]  D. Lindenmayer,et al.  INFERRING PROCESS FROM PATTERN: CAN TERRITORY OCCUPANCY PROVIDE INFORMATION ABOUT LIFE HISTORY PARAMETERS? , 2001 .

[61]  Thomas G. Dietterich,et al.  Incorporating Boosted Regression Trees into Ecological Latent Variable Models , 2011, AAAI.

[62]  J. Elith,et al.  Do they? How do they? WHY do they differ? On finding reasons for differing performances of species distribution models , 2009 .

[63]  R. Hijmans,et al.  Cross-validation of species distribution models: removing spatial sorting bias and calibration with a null model. , 2012, Ecology.

[64]  Dominique Gravel,et al.  Using dynamic vegetation models to simulate plant range shifts , 2014 .

[65]  J. Michael Scott,et al.  Predicting Species Occurrences: Issues of Accuracy and Scale , 2002 .

[66]  Antoine Guisan,et al.  Improving the prediction of plant species distribution and community composition by adding edaphic to topo-climatic variables , 2013 .

[67]  David I. Warton,et al.  Model-Based Control of Observer Bias for the Analysis of Presence-Only Data in Ecology , 2013, PloS one.

[68]  M. Austin Species distribution models and ecological theory: A critical assessment and some possible new approaches , 2007 .

[69]  Mark New,et al.  Ensemble forecasting of species distributions. , 2007, Trends in ecology & evolution.

[70]  A. U.S.,et al.  Effective degrees of freedom : a flawed metaphor , 2015 .

[71]  A. Lehmann,et al.  Improving generalized regression analysis for the spatial prediction of forest communities , 2006 .

[72]  David A. Keith,et al.  The importance of temporal climate variability for spatial patterns in plant diversity , 2013 .

[73]  M. Kearney,et al.  Mechanistic niche modelling: combining physiological and spatial data to predict species' ranges. , 2009, Ecology letters.

[74]  T. M. Smith,et al.  A new model for the continuum concept , 1989 .

[75]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[76]  M. Araújo,et al.  Validation of species–climate impact models under climate change , 2005 .

[77]  Matthew J. Smith,et al.  Protected areas network is not adequate to protect a critically endangered East Africa Chelonian: Modelling distribution of pancake tortoise, Malacochersus tornieri under current and future climates , 2013, bioRxiv.

[78]  Damaris Zurell,et al.  Predicting to new environments: tools for visualizing model behaviour and impacts on mapped distributions , 2012 .

[79]  José Alexandre Felizola Diniz-Filho,et al.  Modelling geographical patterns in species richness using eigenvector-based spatial filters , 2005 .

[80]  Jari Oksanen,et al.  Why the beta-function cannot be used to estimate skewness of species responses , 1997 .

[81]  H. Pulliam On the relationship between niche and distribution , 2000 .

[82]  Steven J. Phillips,et al.  Shifts in Arctic vegetation and associated feedbacks under climate change , 2013 .

[83]  Andrew Gelman,et al.  2. Average Predictive Comparisons for Models with Nonlinearity, Interactions, and Variance Components , 2007 .

[84]  Cedric Pugh Land policies and low-income housing in developing countries , 1992 .

[85]  J. Elder The Generalization Paradox of Ensembles , 2003 .

[86]  John A Silander,et al.  Multivariate forecasts of potential distributions of invasive plant species. , 2009, Ecological applications : a publication of the Ecological Society of America.

[87]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[88]  M. Austin,et al.  On non-linear species response models in ordination , 1976, Vegetatio.

[89]  Boris Schröder,et al.  Decomposing environmental, spatial, and spatiotemporal components of species distributions , 2011 .

[90]  P. Legendre Spatial Autocorrelation: Trouble or New Paradigm? , 1993 .

[91]  Scott K. Robinson,et al.  Exploring the role of physiology and biotic interactions in determining elevational ranges of tropical animals , 2013 .

[92]  J. Franklin Moving beyond static species distribution models in support of conservation biogeography , 2010 .

[93]  Antoine Guisan,et al.  Importance of abiotic stress as a range‐limit determinant for European plants: insights from species responses to climatic gradients , 2009 .

[94]  Sean M. McMahon,et al.  On using integral projection models to generate demographically driven predictions of species' distributions: development and validation using sparse data , 2014 .

[95]  Jane Elith,et al.  Error and uncertainty in habitat models , 2006 .

[96]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[97]  Drew W. Purves,et al.  Fine‐scale environmental variation in species distribution modelling: regression dilution, latent variables and neighbourly advice , 2011 .

[98]  Erin E Blankenship,et al.  Nondetection sampling bias in marked presence-only data , 2013, Ecology and evolution.

[99]  Boris Schröder,et al.  How to understand species’ niches and range dynamics: a demographic research agenda for biogeography , 2012 .

[100]  Damaris Zurell,et al.  Collinearity: a review of methods to deal with it and a simulation study evaluating their performance , 2013 .

[101]  N. V. Jagannathan,et al.  Corruption, delivery systems, and property rights , 1986 .

[102]  Wilfried Thuiller,et al.  A multi‐trait approach reveals the structure and the relative importance of intra‐ vs. interspecific variability in plant traits , 2010 .

[103]  J. Chave The problem of pattern and scale in ecology: what have we learned in 20 years? , 2013, Ecology letters.

[104]  Brendan A. Wintle,et al.  A new method for dealing with residual spatial autocorrelation in species distribution models , 2012 .

[105]  J. Elith,et al.  Species Distribution Models: Ecological Explanation and Prediction Across Space and Time , 2009 .

[106]  Eve McDonald-Madden,et al.  Predicting species distributions for conservation decisions , 2013, Ecology letters.

[107]  F. Schurr,et al.  Forecasting species ranges by statistical estimation of ecological niches and spatial population dynamics , 2012 .

[108]  W. Thuiller BIOMOD – optimizing predictions of species distributions and projecting potential future shifts under global change , 2003 .

[109]  D. Nogues‐Bravo,et al.  Applications of species distribution modeling to paleobiology , 2011 .

[110]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.