Testing the predictive performance of distribution models

Distribution models are used to predict the likelihood of occurrence or abundance of a species at locations where census data are not available. An integral part of modelling is the testing of model performance. We compared different schemes and measures for testing model performance using 79 species from the North American Breeding Bird Survey. The four testing schemes we compared featured increasing independence between test and training data: resubstitution, random data hold-out and two spatially segregated data hold-out designs. The different testing measures also addressed different levels of information content in the dependent variable: regression R 2 for absolute abundance, squared correlation coefficient r 2 for relative abundance and AUC/Somer’s D for presence/absence. We found that higher levels of independence between test and training data lead to lower assessments of prediction accuracy. Even for data collected independently, spatial autocorrelation leads to dependence between random hold-out test data and training data, and thus to inflated measures of model performance. While there is a general awareness of the importance of autocorrelation to model building and hypothesis testing, its consequences via violation of independence between training and testing data have not been addressed systematically and comprehensively before. Furthermore, increasing information content (from correctly classifying presence/absence, to predicting relative abundance, to predicting absolute abundance) leads to decreasing predictive performance. The current tests for presence/absence distribution models are typically overly optimistic because a) the test and training data are not independent and b) the correct classification of presence/absence has a relatively low information content and thus capability to address ecological and conservation questions compared to a prediction of abundance. Meaningful evaluation of model performance requires testing on spatially independent data, if the intended application of the model is to predict into new geographic or climatic space, which arguably is the case for most applications of distribution models.

[1]  M. Araújo,et al.  Uses and misuses of bioclimatic envelope modeling. , 2012, Ecology.

[2]  Hugh P. Possingham,et al.  Evaluating model transferability for a threatened species to adjacent areas: Implications for rock-wallaby conservation , 2011 .

[3]  M. Araújo,et al.  BIOMOD – a platform for ensemble forecasting of species distributions , 2009 .

[4]  B. McGill,et al.  Variation in abundance across a species' range predicts climate change responses in the range interior will exceed those at the edge: a case study with North American beaver , 2008 .

[5]  Steven J. Phillips Transferability, sample selection bias and background data in presence‐only modelling: a response to Peterson et al. (2007) , 2008 .

[6]  Brian J. McGill,et al.  Can niche-based distribution models outperform spatial interpolation? , 2007 .

[7]  S. Jackson,et al.  Novel climates, no‐analog communities, and ecological surprises , 2007 .

[8]  Steven J. Phillips,et al.  WHAT MATTERS FOR PREDICTING THE OCCURRENCES OF TREES: TECHNIQUES, DATA, OR SPECIES' CHARACTERISTICS? , 2007 .

[9]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[10]  A. Townsend Peterson,et al.  Transferability and model evaluation in ecological niche modeling: a comparison of GARP and Maxent , 2007 .

[11]  C. Dormann Effects of incorporating spatial autocorrelation into the analysis of species distribution data , 2007 .

[12]  Flemming Skov,et al.  Ice age legacies in the geographical distribution of tree species richness in europe , 2007 .

[13]  H. Van Dyck,et al.  Transferability of Species Distribution Models: a Functional Habitat Approach for Two Regionally Threatened Butterflies , 2007, Conservation biology : the journal of the Society for Conservation Biology.

[14]  W. Jetz,et al.  Effects of species’ ecology on the accuracy of distribution models , 2007 .

[15]  M. Austin Species distribution models and ecological theory: A critical assessment and some possible new approaches , 2007 .

[16]  Bruce T. Milne,et al.  Animal movements and population dynamics in heterogeneous landscapes , 1992, Landscape Ecology.

[17]  J. Krebs,et al.  Should conservation strategies consider spatial generality? Farmland birds show regional not national patterns of habitat association. , 2007, Ecology letters.

[18]  Mark New,et al.  Ensemble forecasting of species distributions. , 2007, Trends in ecology & evolution.

[19]  D. R. Cutler,et al.  Effects of sample survey design on the accuracy of classification tree models in species distribution models , 2006 .

[20]  M. Zappa,et al.  Are niche‐based species distribution models transferable in space? , 2006 .

[21]  A. Lehmann,et al.  Improving generalized regression analysis for the spatial prediction of forest communities , 2006 .

[22]  M. Araújo,et al.  Five (or so) challenges for species distribution modelling , 2006 .

[23]  T. Dawson,et al.  Model‐based uncertainty in species range prediction , 2006 .

[24]  M. Araújo,et al.  How Does Climate Change Affect Biodiversity? , 2006, Science.

[25]  C. Furlanello,et al.  Predicting habitat suitability with machine learning models: The potential area of Pinus sylvestris L. in the Iberian Peninsula , 2006 .

[26]  M. Araújo,et al.  Consequences of spatial autocorrelation for niche‐based models , 2006 .

[27]  K. Bollmann,et al.  On the generality of habitat distribution models: a case study of capercaillie in three Swiss regions , 2006 .

[28]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[29]  Matthew G. Betts,et al.  The importance of spatial autocorrelation, extent and resolution in predicting forest bird occurrence , 2006 .

[30]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[31]  A. Prasad,et al.  Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction , 2006, Ecosystems.

[32]  D. Richardson,et al.  Niche‐based modelling as a tool for predicting the risk of alien plant invasions at a global scale , 2005, Global change biology.

[33]  M. Araújo,et al.  Validation of species–climate impact models under climate change , 2005 .

[34]  Paul H. Williams,et al.  Downscaling European species atlas distributions to a finer resolution: implications for conservation planning , 2005 .

[35]  Kevin J. Gaston,et al.  Untangling ecological complexity on different scales of space and time , 2004 .

[36]  A. Hampe Bioclimate envelope models: what they detect and what they hide , 2004 .

[37]  M. Araújo,et al.  Presence-absence versus presence-only modelling methods for predicting bird habitat suitability , 2004 .

[38]  A. Peterson,et al.  Ecological niches as stable distributional constraints on mammal species, with implications for Pleistocene extinctions and climate change projections for biodiversity , 2004 .

[39]  J. Svenning,et al.  Potential impact of climatic change on the distribution of forest herbs in Europe , 2004 .

[40]  A. Guisan,et al.  An improved approach for predicting the distribution of rare and endangered species from occurrence and pseudo-absence data , 2004 .

[41]  O. Phillips,et al.  Extinction risk from climate change , 2004, Nature.

[42]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[43]  M. Fortin,et al.  Spatial pattern and ecological analysis , 1989, Vegetatio.

[44]  A. Peterson Predicting the Geography of Species’ Invasions via Ecological Niche Modeling , 2003, The Quarterly Review of Biology.

[45]  A. Peterson,et al.  Modeling current and future potential wintering distributions of eastern North American monarch butterflies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[46]  M. Whittingham,et al.  Do habitat association models have any generality? Predicting skylark Alauda arvensis abundance in different regions of southern England , 2003 .

[47]  M. Austin Spatial prediction of species distribution: an interface between ecological theory and statistical modelling , 2002 .

[48]  David R. B. Stockwell,et al.  Future projections for Mexican faunas under global climate change scenarios , 2002, Nature.

[49]  Mark L. Taper,et al.  Connecting geographical distributions with population processes , 2002 .

[50]  S. Ferrier Mapping spatial pattern in biodiversity for regional conservation planning: where to from here? , 2002, Systematic biology.

[51]  M. Fortin,et al.  Spatial autocorrelation and statistical tests in ecology , 2002 .

[52]  A. Peterson,et al.  PREDICTING SPECIES' GEOGRAPHIC DISTRIBUTIONS BASED ON ECOLOGICAL NICHE MODELING , 2001 .

[53]  Simon Ferrier,et al.  The practical value of modelling relative abundance of species for regional conservation planning: a case study , 2001 .

[54]  H. Pulliam On the relationship between niche and distribution , 2000 .

[55]  Jack J. Lennon,et al.  Red-shifts and red herrings in geographical ecology , 2000 .

[56]  S. Manel,et al.  Alternative methods for predicting species distribution: an illustration with Himalayan river birds , 1999 .

[57]  S. Higgins,et al.  Predicting the Landscape‐Scale Distribution of Alien Plants and Their Threat to Plant Diversity , 1999 .

[58]  P. Jones,et al.  Representing Twentieth-Century Space–Time Climate Variability. Part I: Development of a 1961–90 Mean Monthly Terrestrial Climatology , 1999 .

[59]  A. Prasad,et al.  PREDICTING ABUNDANCE OF 80 TREE SPECIES FOLLOWING CLIMATE CHANGE IN THE EASTERN UNITED STATES , 1998 .

[60]  J. Lawton,et al.  Making mistakes when predicting shifts in species range in response to global warming , 1998, Nature.

[61]  P. Jones,et al.  REPRESENTING TWENTIETH CENTURY SPACE-TIME CLIMATE VARIABILITY. , 1998 .

[62]  John Bell,et al.  A review of methods for the assessment of prediction errors in conservation presence/absence models , 1997, Environmental Conservation.

[63]  W. Link,et al.  The North American Breeding Bird Survey Results and Analysis , 1997 .

[64]  A. Fielding,et al.  Testing the Generality of Bird‐Habitat Models , 1995 .

[65]  W. Link,et al.  Observer differences in the North American Breeding Bird Survey , 1994 .

[66]  P. Legendre Spatial Autocorrelation: Trouble or New Paradigm? , 1993 .

[67]  Brian Dennis,et al.  Estimation of Growth and Extinction Parameters for Endangered Species , 1991 .

[68]  H. Pulliam,et al.  Sources, Sinks, and Population Regulation , 1988, The American Naturalist.

[69]  H. G. Andrewartha,et al.  The Ecological Web: More on the Distribution and Abundance of Animals , 1984 .

[70]  James F. Quinn,et al.  On Hypothesis Testing in Ecology and Evolution , 1983, The American Naturalist.

[71]  M. Randall The Dynamics of an Insect Population Throughout its Altitudinal Distribution: Coleophora alticolella (Lepidoptera) in Northern England , 1982 .

[72]  D. Bystrak,et al.  The north american breeding bird survey. , 1981 .

[73]  J. Rasson,et al.  Finding the edge of a Poisson forest , 1977, Journal of Applied Probability.

[74]  G. Box Science and Statistics , 1976 .

[75]  C. Krebs Ecology: The Experimental Analysis of Distribution and Abundance , 1973 .

[76]  R. Whittaker,et al.  The Saguaro: A Population in Relation to Environment. , 1963, Science.