The effects of species’ range sizes on the accuracy of distribution models: ecological phenomenon or statistical artefact?

Summary 1 Conservation scientists and resource managers increasingly employ empirical distribution models to aid decision-making. However, such models are not equally reliable for all species, and range size can affect their performance. We examined to what extent this effect reflects statistical artefacts arising from the influence of range size on the sample size and sampling prevalence (proportion of samples representing species presence) of data used to train and test models. 2 Our analyses used both simulated data and empirical distribution models for 32 bird species endemic to South Africa, Lesotho and Swaziland. Models were built with either logistic regression or non-linear discriminant analysis, and assessed with four measures of model accuracy: sensitivity, specificity, Cohen's kappa and the area under the curve (AUC) of receiver-operating characteristic (ROC) plots. Environmental indices derived from Fourier-processed satellite imagery served as predictors. 3 We first followed conventional modelling practice to illustrate how range size might influence model performance, when sampling prevalence reflects species’ natural prevalences. We then demonstrated that this influence is primarily artefactual. Statistical artefacts can arise during model assessment, because Cohen's kappa responds systematically to changes in prevalence. AUC, in contrast, is largely unaffected, and thus a more reliable measure of model performance. Statistical artefacts also arise during model fitting. Both logistic regression and discriminant analysis are sensitive to the sample size and sampling prevalence of training data. Both perform best when sample size is large and prevalence intermediate. 4 Synthesis and applications. Species’ ecological characteristics may influence the performance of distribution models. Statistical artefacts, however, can confound results in comparative studies seeking to identify these characteristics. To mitigate artefactual effects, we recommend careful reporting of sampling prevalence, AUC as the measure of accuracy, and fixed, intermediate levels of sampling prevalence in comparative studies.

[1]  Lesley Gibson,et al.  Spatial prediction of rufous bristlebird habitat in a coastal heathland: a GIS-based approach , 2004 .

[2]  A. Guisan,et al.  An improved approach for predicting the distribution of rare and endangered species from occurrence and pseudo-absence data , 2004 .

[3]  Atte Moilanen,et al.  Combining probabilities of occurrence with spatial reserve design , 2004 .

[4]  Ian Phillip Vaughan,et al.  Improving the Quality of Distribution Models for Conservation by Addressing Shortcomings in the Field Collection of Training Data , 2003 .

[5]  N. Sitati,et al.  Predicting spatial aspects of human–elephant conflict , 2003 .

[6]  R. Kadmon,et al.  A SYSTEMATIC ANALYSIS OF FACTORS AFFECTING THE PERFORMANCE OF CLIMATIC ENVELOPE MODELS , 2003 .

[7]  Steve W. Adkins,et al.  Climate change and the potential distribution of an invasive alien plant: Acacia nilotica ssp. indica in Australia , 2003 .

[8]  Patrick E. Osborne,et al.  Should data be partitioned spatially before building large-scale distribution models? , 2002 .

[9]  B. Reiser,et al.  Estimation of the area under the ROC curve , 2002, Statistics in medicine.

[10]  C. Rahbek,et al.  Geographic Range Size and Determinants of Avian Species Richness , 2002, Science.

[11]  David R. B. Stockwell,et al.  Future projections for Mexican faunas under global climate change scenarios , 2002, Nature.

[12]  Donald A. Jackson,et al.  Predictive Models of Fish Species Distributions: A Note on Proper Validation and Chance Predictions , 2002 .

[13]  David R. B. Stockwell,et al.  Effects of sample size on accuracy of species distribution models , 2002 .

[14]  Michael S. Mitchell,et al.  USING LANDSCAPE‐LEVEL DATA TO PREDICT THE DISTRIBUTION OF BIRDS ON A MANAGED FOREST: EFFECTS OF SCALE , 2001 .

[15]  S. Manel,et al.  Evaluating presence-absence models in ecology: the need to account for prevalence , 2001 .

[16]  S. Ferrier,et al.  An evaluation of the predictive performance of distributional models for flora and fauna in north-east New South Wales. , 2001, Journal of environmental management.

[17]  Simon Ferrier,et al.  Incorporating expert opinion and fine-scale vegetation mapping into statistical models of faunal distribution , 2001 .

[18]  Terry Burke,et al.  Environmental correlates of toad abundance and population genetic diversity , 2001 .

[19]  Dr Robert Bryant,et al.  Modelling landscape-scale habitat use using GIS and remote sensing : a case study with great bustards , 2001 .

[20]  P. Donald,et al.  Local extinction of British farmland birds and the prediction of further loss , 2000 .

[21]  S. Langton,et al.  Habitat models of bird species' distribution: an aid to the management of coastal grazing marshes. , 2000 .

[22]  Simon Ferrier,et al.  Evaluating the predictive performance of habitat models developed using logistic regression , 2000 .

[23]  W. Sutherland,et al.  Large‐scale spatial variation in the breeding performance of song thrushes Turdus philomelos and blackbirds T. merula in Britain , 2000 .

[24]  F. Hoehler Bias and prevalence effects on kappa viewed in terms of sensitivity and specificity. , 2000, Journal of clinical epidemiology.

[25]  S. Ferrier,et al.  An evaluation of alternative algorithms for fitting species distribution models using logistic regression , 2000 .

[26]  G. Cumming Using between‐model comparisons to fine‐tune linear models of species ranges , 2000 .

[27]  Jennifer A. Hoeting,et al.  An Improved Model for Spatially Correlated Binary Responses , 2000 .

[28]  S. Hay An overview of remote sensing and geodesy for epidemiology and public health application. , 2000, Advances in parasitology.

[29]  S. Manel,et al.  Alternative methods for predicting species distribution: an illustration with Himalayan river birds , 1999 .

[30]  Zhijun Ma,et al.  Designing the core zone in a biosphere reserve based on suitable habitats: Yancheng Biosphere Reserve and the red crowned crane (Grus japonensis) , 1999 .

[31]  S. Manel,et al.  Comparing discriminant analysis, neural networks and logistic regression for predicting species distributions: a case study with a Himalayan river bird , 1999 .

[32]  A. M. Jarvis,et al.  Predicting population sizes and priority conservation areas for 10 endemic Namibian bird species , 1999 .

[33]  Daniel W. McKenney,et al.  Models of large‐scale breeding‐bird distribution as a function of macro‐climate in Ontario, Canada , 1999 .

[34]  Luigi Boitani,et al.  A Large‐Scale Model of Wolf Distribution in Italy for Conservation Planning , 1999 .

[35]  Simon I. Hay,et al.  Contemporary environmental correlates of endemic bird areas derived from meteorological satellite sensors , 1998, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[36]  J. Joachim,et al.  Evaluation par télédétection des biotopes à gélinotte des bois (Bonasa bonasia) dans le parc national des Cévennes , 1998 .

[37]  John Bell,et al.  A review of methods for the assessment of prediction errors in conservation presence/absence models , 1997, Environmental Conservation.

[38]  V. Parker,et al.  The atlas of southern African birds , 1997 .

[39]  H. Nix,et al.  Biological inventory for conservation evaluation II. Composition, functional relationships and spatial prediction of bird assemblages in southern Australia , 1996 .

[40]  C. Lantz,et al.  Behavior and interpretation of the κ statistic: Resolution of the two paradoxes , 1996 .

[41]  S. T. Buckland,et al.  An autologistic model for the spatial distribution of wildlife , 1996 .

[42]  P. Murtaugh,et al.  The Statistical Evaluation of Ecological Indicators , 1996 .

[43]  S. Hay,et al.  Predicting the distribution of tsetse flies in West Africa using temporal Fourier processed meteorological satellite data. , 1996, Annals of tropical medicine and parasitology.

[44]  C. Lantz,et al.  Behavior and interpretation of the kappa statistic: resolution of the two paradoxes. , 1996, Journal of clinical epidemiology.

[45]  A. Fielding,et al.  Testing the Generality of Bird‐Habitat Models , 1995 .

[46]  H Brenner,et al.  Chance-corrected measures of the validity of a binary diagnostic test. , 1994, Journal of clinical epidemiology.

[47]  Henderson Ar,et al.  Assessing test accuracy and its clinical consequences: a primer for receiver operating characteristic curve analysis. , 1993 .

[48]  A R Henderson,et al.  Assessing Test Accuracy and its Clinical Consequences: A Primer for Receiver Operating Characteristic Curve Analysis , 1993, Annals of clinical biochemistry.

[49]  J. Carlin,et al.  Bias, prevalence and kappa. , 1993, Journal of clinical epidemiology.

[50]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[51]  A. Feinstein,et al.  High agreement but low kappa: II. Resolving the paradoxes. , 1990, Journal of clinical epidemiology.

[52]  A. Agresti An introduction to categorical data analysis , 1997 .

[53]  Byron K. Williams,et al.  Assessment of sampling stability in ecological applications of discriminant analysis , 1988 .

[54]  G. Kitagawa,et al.  Akaike Information Criterion Statistics , 1988 .

[55]  S D Walter,et al.  A reappraisal of the kappa coefficient. , 1988, Journal of clinical epidemiology.

[56]  Louis Legendre,et al.  The Importance of Being Digital , 1963 .

[57]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[58]  C. Chatfield The Analysis of Time Series: An Introduction , 1990 .

[59]  Dale J. Prediger,et al.  Coefficient Kappa: Some Uses, Misuses, and Alternatives , 1981 .

[60]  P. Green,et al.  Analyzing multivariate data , 1978 .

[61]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .