Assessing the accuracy of species distribution models more thoroughly

Species distribution models (SDMs) are empirical models relating species occurrence to environmental variables based on statistical or other response surfaces. SDMs can be used as a tool to solve some theoretical and applied ecological and environmental problems. The success of their applications depends on the accuracy of the models. In this study we propose an approach to thoroughly assess the accuracy of species distribution models. This includes three aspects: First is to use several accuracy indices that not only measure model discrimination capability, but also those that measure model reliability. The former is the power of the model that differentiates presences from absences; and the latter refers to the capability of the predicted probabilities to reflect the true probabilities that species occurs in individual locations. Previous studies have shown that some accuracy measures are sensitive to the prevalence of the test dataset, and that others are not. While all the reliability measures display this sensitivity to prevalence, only do some discriminatory measures fall into the latter group. Many researchers recommend the use of prevalence- insensitive measures in model accuracy assessment. However, using this approach the calibration power of the models cannot be assessed. We argue that calibration measures should also be provided in model accuracy assessments. The second aspect is to provide confidence intervals associated with the estimates of accuracy indices. Analytical methods, both parametric and nonparametric, have been introduced for constructing the confidence intervals for many accuracy indices. Computer-intensive methods (e.g. bootstrap and jackknife) can also be used to construct confidence intervals that are more attractive than the traditional analytical methods as (1) they have less statistical assumptions; and (2) they are virtually applicable to any accuracy measures. The third aspect is to provide an assessment of accuracy across a range of test data prevalence, since some accuracy indices are dependant on this quality of the test data. Test data with differing levels of prevalence will provide a range of results for the same accuracy index. Assessing the accuracy at only one level of prevalence will not provide a complete picture of the accuracy of the models. The range of test data prevalence can be set up by researchers according to their knowledge about the target species, or could be taken from the confidence interval of the population prevalence estimated from the sample data if the data can be considered as a random sample of the population. In this paper, we use an Australian native plant species, Forest Wire-grass (Tetrarrhena juncea), as an example to demonstrate our approach to more thoroughly assessing the accuracy of species distribution models. The accuracy of two models, one from a machine learning method (Random Forest, RF) and another from a statistical method (generalized additive model, GAM), were assessed using nine accuracy indices along a range of test data prevalence (i.e. the 95% confidence interval of the population prevalence estimated from the sample data using bootstrap percentile method), and a bootstrap method was used to construct the confidence intervals for the accuracy indices. With this approach, the species distribution models were thoroughly assessed.

[1]  J. Gastwirth,et al.  The Effect of Dependence on Confidence Intervals for a Population Proportion , 2004 .

[2]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[3]  R. Newcombe,et al.  Interval estimation for the difference between independent proportions: comparison of eleven methods. , 1998, Statistics in medicine.

[4]  Gengsheng Qin,et al.  Comparison of non-parametric confidence intervals for the area under the ROC curve of a continuous-scale diagnostic test , 2008, Statistical methods in medical research.

[5]  Paula Couto,et al.  Assessing the accuracy of spatial simulation models , 2003 .

[6]  Simon Ferrier,et al.  Evaluating the predictive performance of habitat models developed using logistic regression , 2000 .

[7]  D. Zurakowski,et al.  Measurement variability and confidence intervals in medicine: why should radiologists care? , 2003, Radiology.

[8]  I. Jolliffe Uncertainty and Inference for Verification Measures , 2007 .

[9]  M Buyse R(2): a useful measure of model performance when predicting a dichotomous outcome. , 2000, Statistics in medicine.

[10]  P. A. R. Koopman,et al.  Confidence intervals for the ratio of two binomial proportions , 1984 .

[11]  U. Fayyad Knowledge Discovery and Data Mining: An Overview , 1995 .

[12]  Chris J Lloyd,et al.  Exact one-sided confidence limits for the difference between two correlated proportions. , 2007, Statistics in medicine.

[13]  Miska Luoto,et al.  Modelling the occurrence of threatened plant species in taiga landscapes: methodological and ecological perspectives , 2008 .

[14]  Lalit Kumar,et al.  Comparative assessment of the measures of thematic classification accuracy , 2007 .

[15]  P. Bossuyt Interpreting diagnostic test accuracy studies. , 2008, Seminars in hematology.

[16]  Rich Caruana,et al.  Data mining in metric space: an empirical analysis of supervised learning performance criteria , 2004, ROCAI.

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  B. Reiser,et al.  Comparing the Areas Under Two Correlated ROC Curves: Parametric and Non‐Parametric Approaches , 2006, Biometrical journal. Biometrische Zeitschrift.

[19]  D L Riddle,et al.  Interpreting validity indexes for diagnostic tests: an illustration using the Berg balance test. , 1999, Physical therapy.

[20]  Antoine Guisan,et al.  Predictive habitat distribution models in ecology , 2000 .

[21]  Marc Buyse R2: a useful measure of model performance when predicting a dichotomous outcome by A. Ash and M. Schwartz, Statistics in Medicine, 18, 375–384 (1999) , 2000 .

[22]  D. Shapiro,et al.  The interpretation of diagnostic tests , 1999, Statistical methods in medical research.

[23]  A. Ash,et al.  R2: a useful measure of model performance when predicting a dichotomous outcome. , 1999, Statistics in medicine.

[24]  A. Prasad,et al.  Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction , 2006, Ecosystems.

[25]  John Bell,et al.  A review of methods for the assessment of prediction errors in conservation presence/absence models , 1997, Environmental Conservation.