The continuing challenges of testing species distribution models

1Species distribution models could bring manifold benefits across ecology, but require careful testing to prove their reliability and guide users. Shortcomings in testing are often evident, failing to reflect recent methodological developments and changes in the way models are applied. We considered some of the fundamental issues.2Generalizability is a basic requirement for predictive models, describing their capacity to produce accurate predictions with new data, i.e. in real applications beyond model training. Tests of generalizability should be as rigorous as possible: ideally using a large number of independent test sites (≥ 200–300) that represent anticipated applications. Bootstrapping identifies the role of overfitting of the training data in limiting a model's generalizability.3Predictions from most distribution models are continuous variables. Their accuracy may be described by discrimination and calibration components. Discriminatory ability describes how well a model separates occupied from unoccupied sites. It is independent of species prevalence and is readily comparable between models. Rank correlation coefficients, such as the concordance index, are effective measures.4Calibration describes the numerical accuracy of predictions (e.g. whether 40% of sites with predicted probabilities of 0·40 are occupied) but is frequently overlooked in model testing. Poor calibration could mislead any conservation efforts utilizing models to estimate the ‘value’ of different sites for a given species. Effective assessments can be made using smoothed calibration plots.5The effects of species prevalence on nominal presence–absence predictions are well known. The currently preferred accuracy measure, Cohen's κ, has weaknesses. We argue that mutual information measures, based in information theory, may be more appropriate.6Synthesis and applications. Model evaluation must be informative and should ideally: (i) define generalizability in detail; (ii) separate the discrimination and calibration components of accuracy and test both; (iii) adopt assessment techniques that permit more valid intermodel comparisons; (iv) avoid nominal presence–absence evaluation where possible and consider information-theoretic measures; and (v) utilize the full range of techniques to help diagnose the causes of prediction problems. Few modellers in applied ecology and conservation biology satisfy these needs, making it difficult for others to evaluate models and identify potential misuses. The problems are real, and if uncorrected will damage conservation efforts through the inaccurate assessment of distribution and habitat preferences of important organisms.

[1]  H Brenner,et al.  Chance-corrected measures of the validity of a binary diagnostic test. , 1994, Journal of clinical epidemiology.

[2]  John T. Finn,et al.  Use of the Average Mutual Information Index in Evaluating Classification Error and Consistency , 1993, Int. J. Geogr. Inf. Sci..

[3]  D. Cox Two further applications of a model for binary regression , 1958 .

[4]  Ewout W Steyerberg,et al.  Internal and external validation of predictive models: a simulation study of bias and precision in small samples. , 2003, Journal of clinical epidemiology.

[5]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[6]  Paula Couto,et al.  Assessing the accuracy of spatial simulation models , 2003 .

[7]  J. Oostermeijer,et al.  The relationship between butterflies and environmental indicator values: a tool for conservation in a changing landscape , 1998 .

[8]  Helena Chmura Kraemer,et al.  Reconsidering the odds ratio as a measure of 2×2 association in a population , 2004, Statistics in medicine.

[9]  Antoine Guisan,et al.  Predictive habitat distribution models in ecology , 2000 .

[10]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[11]  T. Dawson,et al.  Modelling potential impacts of climate change on the bioclimatic envelope of species in Britain and Ireland , 2002 .

[12]  D G Altman,et al.  What do we mean by validating a prognostic model? , 2000, Statistics in medicine.

[13]  R Simon,et al.  Why predictive indexes perform less well in validation studies. Is it magic or methods? , 1987, Archives of internal medicine.

[14]  P. Bossuyt,et al.  The diagnostic odds ratio: a single indicator of test performance. , 2003, Journal of clinical epidemiology.

[15]  Stanley V. Gregory,et al.  Ecological uses for genetic algorithms: predicting fish distributions in complex physical habitats , 1995 .

[16]  J. Habbema,et al.  Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. , 2001, Journal of clinical epidemiology.

[17]  S. Manel,et al.  Evaluating presence-absence models in ecology: the need to account for prevalence , 2001 .

[18]  Stuart J. Marsden,et al.  Habitat associations of parrots on the Wallacean islands of Buru, Seram and Sumba , 1999 .

[19]  S L Hui,et al.  Validation techniques for logistic regression models. , 1991, Statistics in medicine.

[20]  M. Boyce,et al.  Evaluating resource selection functions , 2002 .

[21]  J. F. Wright,et al.  Development and use of a system for predicting the macroinvertebrate fauna in flowing waters , 1995 .

[22]  Donald A. Jackson,et al.  Predictive Models of Fish Species Distributions: A Note on Proper Validation and Chance Predictions , 2002 .

[23]  John Bell,et al.  A review of methods for the assessment of prediction errors in conservation presence/absence models , 1997, Environmental Conservation.

[24]  S. Suárez‐Seoane,et al.  Large-scale habitat selection by agricultural steppe birds in Spain: identifying species–habitat responses using generalized additive models , 2002 .

[25]  Gordon B. Stenhouse,et al.  Removing GPS collar bias in habitat selection studies , 2004 .

[26]  Annett Bartsch,et al.  Modelling habitat selection and distribution of the critically endangered Jerdon's courser Rhinoptilus bitorquatus in scrub jungle: an application of a new tracking method , 2004 .

[27]  Atte Moilanen,et al.  Combining probabilities of occurrence with spatial reserve design , 2004 .

[28]  S. Ferrier,et al.  An evaluation of the predictive performance of distributional models for flora and fauna in north-east New South Wales. , 2001, Journal of environmental management.

[29]  A. Guisan,et al.  An improved approach for predicting the distribution of rare and endangered species from occurrence and pseudo-absence data , 2004 .

[30]  D. Jolly,et al.  The Mediterranean vegetation: what if the atmospheric CO2 increased? , 2001, Landscape Ecology.

[31]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[32]  J. Habbema,et al.  Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. , 2000, Statistics in medicine.

[33]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[34]  C. Chatfield Model uncertainty, data mining and statistical inference , 1995 .

[35]  A. Fielding,et al.  Testing the Generality of Bird‐Habitat Models , 1995 .

[36]  Gerald J. Niemi,et al.  Climate and satellite‐derived land cover for predicting breeding bird distribution in the Great Lakes Basin , 2004 .

[37]  D. Rogers,et al.  The effects of species’ range sizes on the accuracy of distribution models: ecological phenomenon or statistical artefact? , 2004 .

[38]  Forbes Ad,et al.  Classification-algorithm evaluation: five performance measures based on confusion matrices. , 1995 .

[39]  G. Cumming Using between‐model comparisons to fine‐tune linear models of species ranges , 2000 .

[40]  Jesse A. Berlin,et al.  Assessing the Generalizability of Prognostic Information , 1999 .

[41]  P. Donald,et al.  Local extinction of British farmland birds and the prediction of further loss , 2000 .

[42]  R D Cebul,et al.  The importance of disease prevalence in transporting clinical prediction rules. The case of streptococcal pharyngitis. , 1986, Annals of internal medicine.

[43]  Dale J. Prediger,et al.  Coefficient Kappa: Some Uses, Misuses, and Alternatives , 1981 .

[44]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[45]  S. Ormerod,et al.  New paradigms for modelling species distributions , 2004 .

[46]  William J. Zielinski,et al.  Using Presence‐Absence Data to Build and Test Spatial Habitat Models for the Fisher in the Klamath Region, U.S.A. , 1999 .

[47]  David A. Elston,et al.  Empirical models for the spatial distribution of wildlife , 1993 .

[48]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[49]  F Seillier-Moiseiwitsch,et al.  Predictive diagnostics for logistic models. , 1996, Statistics in medicine.

[50]  F. Harrell,et al.  Evaluating the yield of medical tests. , 1982, JAMA.

[51]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[52]  A. O. Nicholls How to make biological surveys go further with generalised linear models , 1989 .

[53]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[54]  Mark S. Boyce,et al.  A quantitative approach to conservation planning: using resource selection functions to map the distribution of mountain caribou at multiple spatial scales , 2004 .

[55]  A Donner,et al.  Sample size requirements for the comparison of two or more coefficients of inter-observer agreement. , 1998, Statistics in medicine.

[56]  David L. Verbyla,et al.  Resampling methods for evaluating classification accuracy of wildlife habitat models , 1989 .

[57]  Lesley Gibson,et al.  Spatial prediction of rufous bristlebird habitat in a coastal heathland: a GIS-based approach , 2004 .

[58]  A. Feinstein,et al.  High agreement but low kappa: I. The problems of two paradoxes. , 1990, Journal of clinical epidemiology.

[59]  Simon Ferrier,et al.  Evaluating the predictive performance of habitat models developed using logistic regression , 2000 .