ENMeval: An R package for conducting spatially independent evaluations and estimating optimal model complexity for Maxent ecological niche models

Summary Recent studies have demonstrated a need for increased rigour in building and evaluating ecological niche models (ENMs) based on presence-only occurrence data. Two major goals are to balance goodness-of-fit with model complexity (e.g. by ‘tuning’ model settings) and to evaluate models with spatially independent data. These issues are especially critical for data sets suffering from sampling bias, and for studies that require transferring models across space or time (e.g. responses to climate change or spread of invasive species). Efficient implementation of procedures to accomplish these goals, however, requires automation. We developed ENMeval, an R package that: (i) creates data sets for k-fold cross-validation using one of several methods for partitioning occurrence data (including options for spatially independent partitions), (ii) builds a series of candidate models using Maxent with a variety of user-defined settings and (iii) provides multiple evaluation metrics to aid in selecting optimal model settings. The six methods for partitioning data are n−1 jackknife, random k-folds ( = bins), user-specified folds and three methods of masked geographically structured folds. ENMeval quantifies six evaluation metrics: the area under the curve of the receiver-operating characteristic plot for test localities (AUCTEST), the difference between training and testing AUC (AUCDIFF), two different threshold-based omission rates for test localities and the Akaike information criterion corrected for small sample sizes (AICc). We demonstrate ENMeval by tuning model settings for eight tree species of the genus Coccoloba in Puerto Rico based on AICc. Evaluation metrics varied substantially across model settings, and models selected with AICc differed from default ones. In summary, ENMeval facilitates the production of better ENMs and should promote future methodological research on many outstanding issues.

[1]  Dan L. Warren,et al.  Incorporating model complexity and spatial sampling bias into ecological niche models of climate change risks faced by 90 California vertebrate species of concern , 2014 .

[2]  Steven J. Phillips,et al.  The art of modelling range‐shifting species , 2010 .

[3]  W. Thuiller,et al.  Predicting species distribution: offering more than simple habitat models. , 2005, Ecology letters.

[4]  J. Andrew Royle,et al.  Presence‐only modelling using MAXENT: when can we trust the inferences? , 2013 .

[5]  Leon C. Hinz,et al.  Using Maxent to model the historic distributions of stonefly species in Illinois streams: The effects of regularization and threshold selections , 2013 .

[6]  Robert P. Anderson,et al.  Species-specific tuning increases robustness to sampling bias in models of species distributions: An implementation with Maxent , 2011 .

[7]  Sam Veloz,et al.  Spatially autocorrelated sampling falsely inflates measures of accuracy for presence‐only niche models , 2009 .

[8]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[9]  M. Turelli,et al.  Environmental Niche Equivalency versus Conservatism: Quantitative Approaches to Niche Evolution , 2008, Evolution; international journal of organic evolution.

[10]  Robert P. Anderson,et al.  Real vs. artefactual absences in species distributions: tests for Oryzomys albigularis (Rodentia: Muridae) in Venezuela , 2003 .

[11]  M. Araújo,et al.  Five (or so) challenges for species distribution modelling , 2006 .

[12]  Matthew J. Smith,et al.  The Effects of Sampling Bias and Model Complexity on the Predictive Performance of MaxEnt Species Distribution Models , 2013, PloS one.

[13]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[14]  Robert P. Anderson,et al.  Ecological Niches and Geographic Distributions , 2011 .

[15]  Steven J. Phillips Transferability, sample selection bias and background data in presence‐only modelling: a response to Peterson et al. (2007) , 2008 .

[16]  M. Araújo,et al.  Uses and misuses of bioclimatic envelope modeling. , 2012, Ecology.

[17]  Robert P. Anderson,et al.  Estimating optimal complexity for ecological niche models: A jackknife approach for species with small sample sizes , 2013 .

[18]  R. Hijmans,et al.  Cross-validation of species distribution models: removing spatial sorting bias and calibration with a null model. , 2012, Ecology.

[19]  Miroslav Dudík,et al.  Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation , 2008 .

[20]  Robert P. Anderson,et al.  Making better Maxent models of species distributions: complexity, overfitting and evaluation , 2014 .

[21]  T. Hastie,et al.  Finite-Sample Equivalence in Statistical Models for Presence-Only Data. , 2012, The annals of applied statistics.

[22]  M. Kearney,et al.  Correlation and process in species distribution models: bridging a dichotomy , 2012 .

[23]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[24]  Matthew J. Smith,et al.  Protected areas network is not adequate to protect a critically endangered East Africa Chelonian: Modelling distribution of pancake tortoise, Malacochersus tornieri under current and future climates , 2013, bioRxiv.

[25]  T. Schoener The Anolis Lizards of Bimini: Resource Partitioning in a Complex Fauna , 1968 .

[26]  R. Real,et al.  AUC: a misleading measure of the performance of predictive distribution models , 2008 .

[27]  C. Graham,et al.  Integrating GIS-based environmental data into evolutionary biology. , 2008, Trends in ecology & evolution.

[28]  Eileen H. Helmer,et al.  Mapping the climate of Puerto Rico, Vieques and Culebra , 2003 .

[29]  Walter J. Bawiec,et al.  Geology, geochemistry, geophysics, mineral occurrences, and mineral resource assessment for the commonwealth of Puerto Rico , 1998 .

[30]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[31]  Robert P. Anderson,et al.  A framework for using niche models to estimate impacts of climate change on species distributions , 2013, Annals of the New York Academy of Sciences.

[32]  Dan L Warren,et al.  Ecological niche modeling in Maxent: the importance of model complexity and the performance of model selection criteria. , 2011, Ecological applications : a publication of the Ecological Society of America.

[33]  Scott K. Robinson,et al.  Exploring the role of physiology and biotic interactions in determining elevational ranges of tropical animals , 2013 .

[34]  D. Warton,et al.  Equivalence of MAXENT and Poisson Point Process Models for Species Distribution Modeling in Ecology , 2013, Biometrics.

[35]  Julian D. Olden,et al.  Assessing transferability of ecological models: an underappreciated aspect of statistical validation , 2012 .

[36]  Steven J. Phillips,et al.  Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. , 2009, Ecological applications : a publication of the Ecological Society of America.

[37]  Trevor Hastie,et al.  A statistical explanation of MaxEnt for ecologists , 2011 .

[38]  John Bell,et al.  A review of methods for the assessment of prediction errors in conservation presence/absence models , 1997, Environmental Conservation.

[39]  R. Pearson,et al.  Predicting species distributions from small numbers of occurrence records: A test case using cryptic geckos in Madagascar , 2006 .

[40]  David R. Anderson,et al.  Multimodel Inference , 2004 .