A checklist for maximizing reproducibility of ecological niche models

Reporting specific modelling methods and metadata is essential to the reproducibility of ecological studies, yet guidelines rarely exist regarding what information should be noted. Here, we address this issue for ecological niche modelling or species distribution modelling, a rapidly developing toolset in ecology used across many aspects of biodiversity science. Our quantitative review of the recent literature reveals a general lack of sufficient information to fully reproduce the work. Over two-thirds of the examined studies neglected to report the version or access date of the underlying data, and only half reported model parameters. To address this problem, we propose adopting a checklist to guide studies in reporting at least the minimum information necessary for ecological niche modelling reproducibility, offering a straightforward way to balance efficiency and accuracy. We encourage the ecological niche modelling community, as well as journal reviewers and editors, to utilize and further develop this framework to facilitate and improve the reproducibility of future work. The proposed checklist framework is generalizable to other areas of ecology, especially those utilizing biodiversity data, environmental data and statistical modelling, and could also be adopted by a broader array of disciplines.The authors evaluate the reproducibility of ecological niche modelling literature and provide a checklist of crucial items for more reproducible ecological niche models.

[1]  Márcio José da Silva,et al.  Phylogeography of the dry vegetation endemic species Nephila sexpunctata (Araneae: Araneidae) suggests recent expansion of the Neotropical Dry Diagonal , 2017 .

[2]  Riley F. Bernard,et al.  Rapid range expansion of the Brazilian free-tailed bat in the southeastern United States, 2008–2016 , 2018, Journal of Mammalogy.

[3]  Bak,et al.  Punctuated equilibrium and criticality in a simple model of evolution. , 1993, Physical review letters.

[4]  Matthew E. Aiello-Lammens,et al.  spThin: an R package for spatial thinning of species occurrence records for use in ecological niche models , 2015 .

[5]  O. De Clerck,et al.  In search of relevant predictors for marine species distribution modelling using the MarineSPEED benchmark dataset , 2018 .

[6]  F. Leland Russell,et al.  Demographic structure and genetic variability throughout the distribution of Platte thistle (Cirsium canescens Asteraceae) , 2017 .

[7]  L. Bosso,et al.  Ignoring seasonal changes in the ecological niche of non-migratory species may lead to biases in potential distribution models: lessons from bats , 2018, Biodiversity and Conservation.

[8]  S H Strogatz,et al.  Random graph models of social networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Matthew E. Aiello-Lammens,et al.  Wallace: A flexible platform for reproducible modeling of species niches and distributions built for community expansion , 2017 .

[10]  M. Boyce,et al.  Evaluating resource selection functions , 2002 .

[11]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[12]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[13]  Shanshan Wu,et al.  Building statistical models to analyze species distributions. , 2006, Ecological applications : a publication of the Ecological Society of America.

[14]  P. Hunter The reproducibility “crisis” , 2017, EMBO reports.

[15]  Alberto Jiménez-Valverde,et al.  Delimiting the geographical background in species distribution modelling , 2012 .

[16]  Mark P. Robertson,et al.  Biogeo: an R package for assessing and improving data quality of occurrence record datasets , 2016 .

[17]  Dan L Warren,et al.  Ecological niche modeling in Maxent: the importance of model complexity and the performance of model selection criteria. , 2011, Ecological applications : a publication of the Ecological Society of America.

[18]  H. Pulliam On the relationship between niche and distribution , 2000 .

[19]  F. Palomares,et al.  Niche centrality and human influence predict rangewide variation in population abundance of a widespread mammal: The collared peccary (Pecari tajacu) , 2018 .

[20]  Lindsay P. Campbell,et al.  NicheA: creating virtual species and ecological niches in multivariate environmental scenarios , 2016 .

[21]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[22]  A. Townsend Peterson,et al.  Constraints on interpretation of ecological niche models by limited environmental ranges on calibration areas , 2012 .

[23]  Walter Jetz,et al.  Humboldt Core – toward a standardized capture of biological inventories for biodiversity monitoring, modeling and assessment , 2018 .

[24]  Daniele Silvestro,et al.  CoordinateCleaner: Standardized cleaning of occurrence records from biological collection databases , 2019, Methods in Ecology and Evolution.

[25]  Qinghua Guo,et al.  The point-radius method for georeferencing locality descriptions and calculating associated uncertainty , 2004, Int. J. Geogr. Inf. Sci..

[26]  Timothy J. S. Whitfeld,et al.  Widespread sampling biases in herbaria revealed from large-scale digitization , 2017, bioRxiv.

[27]  J. Bedia,et al.  A framework for species distribution modelling with improved pseudo-absence generation , 2015 .

[28]  Walter Jetz,et al.  Species' range model metadata standards: RMMS , 2019, Global Ecology and Biogeography.

[29]  Matthew E. Aiello-Lammens,et al.  Improving niche and range estimates with Maxent and point process models by integrating spatially explicit information , 2016 .

[30]  D. Rogers,et al.  The effects of species’ range sizes on the accuracy of distribution models: ecological phenomenon or statistical artefact? , 2004 .

[31]  A. Peterson,et al.  An evaluation of transferability of ecological niche models , 2018, Ecography.

[32]  Francesco Carotenuto,et al.  Does the jack of all trades fare best? Survival and niche width in Late Pleistocene megafauna , 2017 .

[33]  Robert P. Anderson,et al.  Ecological Niches and Geographic Distributions , 2011 .

[34]  Ian M. Mitchell,et al.  Best Practices for Scientific Computing , 2012, PLoS biology.

[35]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[36]  Robert P. Anderson,et al.  The effect of spatially marginal localities in modelling species niches and distributions , 2014 .

[37]  Thomas E. Nichols,et al.  Best practices in data analysis and sharing in neuroimaging using MRI , 2017, Nature Neuroscience.

[38]  S. Richards,et al.  Prevalence, thresholds and the performance of presence–absence models , 2014 .

[39]  Jorge Soberón,et al.  Niches and distributional areas: Concepts, methods, and assumptions , 2009, Proceedings of the National Academy of Sciences.

[40]  Monica Papeş,et al.  Ecological niche modelling confirms potential north‐east range expansion of the nine‐banded armadillo (Dasypus novemcinctus) in the USA , 2015 .

[41]  Matthew C. Fitzpatrick,et al.  Field‐measured variables outperform derived alternatives in Maryland stream biodiversity models , 2017 .

[42]  M. Papes,et al.  Present and Potential Future Distribution of Common Vampire Bats in the Americas and the Associated Risk to Cattle , 2012, PloS one.

[43]  T. Dawson,et al.  Selecting thresholds of occurrence in the prediction of species distributions , 2005 .

[44]  Jason Matthiopoulos,et al.  The interpretation of habitat preference metrics under use–availability designs , 2010, Philosophical Transactions of the Royal Society B: Biological Sciences.

[45]  Lynn W. Robbins,et al.  Range expansion and distributional limits of the nine‐banded armadillo in the United States: an update of Taulman & Robbins (1996) , 2014 .

[46]  Amanda M. West,et al.  Using district-level occurrences in MaxEnt for predicting the invasion potential of an exotic insect pest in India , 2014 .

[47]  Walter Jetz,et al.  Integrating biodiversity distribution knowledge: toward a global map of life. , 2012, Trends in ecology & evolution.

[48]  Mark Schildhauer,et al.  Cyberinfrastructure for an integrated botanical information network to investigate the ecological impacts of global climate change on plant biodiversity , 2016 .

[49]  Nicholas K. Dulvy,et al.  Thermal tolerance and the global redistribution of animals , 2012 .

[50]  Birgit Müller,et al.  A standard protocol for describing individual-based and agent-based models , 2006 .

[51]  A. Townsend Peterson,et al.  The role of physiological optima in shaping the geographic distribution of Spanish moss , 2014 .

[52]  Miroslav Dudík,et al.  Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation , 2008 .

[53]  A. Peterson,et al.  New developments in museum-based informatics and applications in biodiversity analysis. , 2004, Trends in ecology & evolution.

[54]  R. G. Davies,et al.  Methods to account for spatial autocorrelation in the analysis of species distributional data : a review , 2007 .

[55]  R. Guralnick,et al.  BioGeomancer: Automated Georeferencing to Map the World's Biodiversity Data , 2006, PLoS biology.

[56]  Antoine Guisan,et al.  Are niche-based species distribution models transferable in space? , 2006 .

[57]  R. Real,et al.  AUC: a misleading measure of the performance of predictive distribution models , 2008 .

[58]  A. Townsend Peterson,et al.  Transferability and model evaluation in ecological niche modeling: a comparison of GARP and Maxent , 2007 .

[59]  Gabriela Zuquim,et al.  Beyond climate control on species range: The importance of soil data to predict distribution of Amazonian plant species , 2018 .

[60]  Edmund Hart,et al.  Towards a more reproducible ecology , 2016 .

[61]  A. Peterson,et al.  Accessible areas in ecological niche comparisons of invasive species: Recognized but still overlooked , 2017, Scientific Reports.

[62]  J. L. Parra,et al.  Very high resolution interpolated climate surfaces for global land areas , 2005 .

[63]  Steve Kelling,et al.  Participatory design of DataONE - Enabling cyberinfrastructure for the biological and environmental sciences , 2012, Ecol. Informatics.

[64]  Susan P. Worner,et al.  Novel Three-Step Pseudo-Absence Selection Technique for Improved Species Distribution Modelling , 2013, PloS one.

[65]  V. Barve,et al.  Variation in niche and distribution model performance: The need for a priori assessment of key causal factors , 2012 .

[66]  Heather M. Williams,et al.  A temporally explicit species distribution model for a long distance avian migrant, the common cuckoo , 2017 .

[67]  Steven J. Phillips,et al.  Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. , 2009, Ecological applications : a publication of the Ecological Society of America.

[68]  Damaris Zurell,et al.  Collinearity: a review of methods to deal with it and a simulation study evaluating their performance , 2013 .

[69]  Daniel Sabatier,et al.  Species Distribution Modelling: Contrasting presence-only models with plot abundance data , 2018, Scientific Reports.

[70]  M. White,et al.  Measuring and comparing the accuracy of species distribution models with presence–absence data , 2011 .

[71]  Omri Allouche,et al.  Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS) , 2006 .

[72]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[73]  Rubén G. Mateo,et al.  Impact of model complexity on cross-temporal transferability in Maxent species distribution models: An assessment using paleobotanical data , 2015 .

[74]  J. Andrew Royle,et al.  Likelihood analysis of species occurrence probability from presence‐only data for modelling species distributions , 2012, Methods in Ecology and Evolution.

[75]  Huijie Qiao,et al.  Assessment of climatically suitable area for Syrmaticus reevesii under climate change , 2015 .

[76]  Asher Mullard,et al.  Reliability of 'new drug target' claims called into question , 2011, Nature Reviews Drug Discovery.

[77]  Zhenyuan Lu,et al.  The taxonomic name resolution service: an online tool for automated standardization of plant names , 2013, BMC Bioinformatics.

[78]  A. Budden,et al.  Big data and the future of ecology , 2013 .

[79]  J. Elith,et al.  Sensitivity of predictive species distribution models to change in grain size , 2007 .

[80]  Juan Manuel Dodero,et al.  Metadata and Semantics Research , 2012, Communications in Computer and Information Science.

[81]  Jorge Soberón,et al.  Creating individual accessible area hypotheses improves stacked species distribution model performance , 2018 .

[82]  C. Begley,et al.  Drug development: Raise standards for preclinical cancer research , 2012, Nature.

[83]  Steven J. Phillips,et al.  The art of modelling range‐shifting species , 2010 .

[84]  Daniel S. Park,et al.  A reciprocal test of Darwin's naturalization hypothesis in two mediterranean‐climate regions , 2015 .

[85]  R. Bonney,et al.  Citizen Science: A Developing Tool for Expanding Science Knowledge and Scientific Literacy , 2009 .

[86]  K. Popper,et al.  Conjectures and refutations;: The growth of scientific knowledge , 1972 .

[87]  Robert A. Boria,et al.  ENMeval: An R package for conducting spatially independent evaluations and estimating optimal model complexity for Maxent ecological niche models , 2014 .

[88]  John Bell,et al.  A review of methods for the assessment of prediction errors in conservation presence/absence models , 1997, Environmental Conservation.

[89]  James Taylor,et al.  Next-generation sequencing data interpretation: enhancing reproducibility and accessibility , 2012, Nature Reviews Genetics.

[90]  R. Pearson,et al.  Predicting species distributions from small numbers of occurrence records: A test case using cryptic geckos in Madagascar , 2006 .

[91]  TIM M. BLACKBURN,et al.  Reproducibility and Repeatability in Ecology , 2006 .

[92]  M. White,et al.  On the selection of thresholds for predicting species occurrence with presence‐only data , 2015, Ecology and evolution.

[93]  A. Townsend Peterson,et al.  Rethinking receiver operating characteristic analysis applications in ecological niche modeling , 2008 .

[94]  Giles M. Foody,et al.  An overview of recent remote sensing and GIS based research in ecological informatics , 2011, Ecol. Informatics.

[95]  Nobuya Suzuki,et al.  Developing landscape habitat models for rare amphibians with small geographic ranges: a case study of Siskiyou Mountains salamanders in the western USA , 2008, Biodiversity and Conservation.

[96]  Daniel S. Park,et al.  Implications and alternatives of assigning climate data to geographical centroids , 2017 .

[97]  K. Popper,et al.  Conjectures and refutations;: The growth of scientific knowledge , 1972 .

[98]  K. Bollmann,et al.  Selecting from correlated climate variables: a major source of uncertainty for predicting species distributions under climate change , 2013 .

[99]  Neil Reid,et al.  Applying species distribution modelling to a data poor, pelagic fish complex: the ocean sunfishes , 2017 .

[100]  Moung-Jin Lee,et al.  The sensitivity of species distribution modeling to scale differences , 2013 .

[101]  W. D. Kissling,et al.  The role of biotic interactions in shaping distributions and realised assemblages of species: implications for species distribution modelling , 2012, Biological reviews of the Cambridge Philosophical Society.

[102]  M. Araújo,et al.  An evaluation of methods for modelling species distributions , 2004 .

[103]  Trevor Hastie,et al.  A statistical explanation of MaxEnt for ecologists , 2011 .

[104]  Walter P. Carson,et al.  Would Ecology Fail the Repeatability Test , 2016 .

[105]  Claire Garrigue,et al.  Finding the right fit: Comparative cetacean distribution models using multiple data sources and statistical approaches , 2018, Diversity and Distributions.

[106]  Robert A. Boria,et al.  A single‐algorithm ensemble approach to estimating suitability and uncertainty: cross‐time projections for four Malagasy tenrecs , 2017 .

[107]  R. G. Davies,et al.  The Influence of Late Quaternary Climate-Change Velocity on Species Endemism , 2011, Science.

[108]  John Wieczorek,et al.  Data Quality Task Group 2: Tests and Assertions , 2018 .

[109]  A M Latimer,et al.  Hierarchical models facilitate spatial analysis of large data sets: a case study on invasive plant species in the northeastern United States. , 2009, Ecology letters.

[110]  E. L. H. Giehl,et al.  A little bit everyday: range size determinants in Arachis (Fabaceae), a dispersal‐limited group , 2017 .

[111]  David J. Gavaghan,et al.  The zoon r package for reproducible and shareable species distribution modelling , 2017 .

[112]  John S. Terblanche,et al.  A global assessment of climatic niche shifts and human influence in insect invasions , 2017 .

[113]  N. Zimmermann,et al.  Habitat Suitability and Distribution Models: With Applications in R , 2017 .

[114]  Daniel S. Park,et al.  Collinearity in ecological niche modeling: Confusions and challenges , 2019, Ecology and evolution.

[115]  Ashton M. Shortridge,et al.  Effects of grain size and niche breadth on species distribution modeling , 2018 .

[116]  B. McGill,et al.  Testing the predictive performance of distribution models , 2013 .

[117]  C. Graham,et al.  Selecting pseudo-absence data for presence-only distribution modeling: How far should you stray from what you know? , 2009 .

[118]  Simon Pfanzelt,et al.  Is evolution of apomicts driven by the phylogeography of the sexual ancestor? Insights from European and Caucasian brambles (Rubus, Rosaceae) , 2017 .

[119]  Helen M. Regan,et al.  Big data for forecasting the impacts of global change on plant communities , 2017 .

[120]  J. Engler,et al.  Mapping Species Distributions with MAXENT Using a Geographically Biased Sample of Presence Data: A Performance Assessment of Methods for Correcting Sampling Bias , 2014, PloS one.

[121]  A. Townsend Peterson,et al.  kuenm: an R package for detailed development of ecological niche models using Maxent , 2019, PeerJ.

[122]  Robert P. Anderson,et al.  Opening the black box: an open-source release of Maxent , 2017 .

[123]  Robert P. Anderson,et al.  Species-specific tuning increases robustness to sampling bias in models of species distributions: An implementation with Maxent , 2011 .

[124]  M. Gad-el-Hak Publish or Perish—An Ailing Enterprise? , 2004 .

[125]  Jorge Soberón,et al.  Mechanistic and Correlative Models of Ecological Niches , 2015 .

[126]  John Wieczorek,et al.  Darwin Core: An Evolving Community-Developed Biodiversity Data Standard , 2012, PloS one.

[127]  J. Elith,et al.  Do they? How do they? WHY do they differ? On finding reasons for differing performances of species distribution models , 2009 .

[128]  David D. Ackerly,et al.  Best practices for reporting climate data in ecology , 2018, Nature Climate Change.

[129]  Diego Nieto-Lugilde,et al.  How will climate novelty influence ecological forecasts? Using the Quaternary to assess future reliability , 2018, Global change biology.

[130]  Matthew J. Smith,et al.  Protected areas network is not adequate to protect a critically endangered East Africa Chelonian: Modelling distribution of pancake tortoise, Malacochersus tornieri under current and future climates , 2013, bioRxiv.

[131]  Damaris Zurell,et al.  Predicting to new environments: tools for visualizing model behaviour and impacts on mapped distributions , 2012 .

[132]  P. Brussard,et al.  Matters of scale. , 1993, Science.

[133]  D. Warton,et al.  Correction note: Poisson point process models solve the “pseudo-absence problem” for presence-only data in ecology , 2010, 1011.3319.

[134]  Patricia C. Wright,et al.  Estimating the population size of lemurs based on their mutualistic food trees , 2018, Journal of Biogeography.

[135]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[136]  M. Papes,et al.  Can incomplete knowledge of species’ physiology facilitate ecological niche modelling? A case study with virtual species , 2017 .

[137]  William K. Michener,et al.  Project Data Management Planning , 2018 .

[138]  Jordan S. Read,et al.  geoknife: reproducible web-processing of large gridded datasets , 2016 .

[139]  Antoine Guisan,et al.  Selecting predictors to maximize the transferability of species distribution models: lessons from cross‐continental plant invasions , 2017 .

[140]  Robert A. Boria,et al.  Spatial filtering to reduce sampling bias can improve the performance of ecological niche models , 2014 .

[141]  J. Andrew Royle,et al.  Modelling occurrence and abundance of species when detection is imperfect , 2005 .

[142]  Anthony Lehmann,et al.  GRASP: generalized regression analysis and spatial prediction , 2002 .

[143]  András Báldi,et al.  Lost locations and the (ir)repeatability of ecological studies , 2012 .

[144]  John P. A. Ioannidis,et al.  A manifesto for reproducible science , 2017, Nature Human Behaviour.

[145]  M. White,et al.  Selecting thresholds for the prediction of species occurrence with presence‐only data , 2013 .

[146]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[147]  F. Jiguet,et al.  Selecting pseudo‐absences for species distribution models: how, where and how many? , 2012 .

[148]  Catherine S. Jarnevich,et al.  Misleading prioritizations from modelling range shifts under climate change , 2018 .

[149]  Robert P. Anderson,et al.  Environmental filters reduce the effects of sampling bias and improve predictions of ecological niche models , 2014 .

[150]  Stephen E. Fick,et al.  WorldClim 2: new 1‐km spatial resolution climate surfaces for global land areas , 2017 .

[151]  V. Pacheco,et al.  Reassessment of the hairy long-nosed armadillo "Dasypus" pilosus (Xenarthra, Dasypodidae) and revalidation of the genus Cryptophractus Fitzinger, 1856. , 2015, Zootaxa.

[152]  Jennifer A. Miller,et al.  Mapping Species Distributions: Spatial Inference and Prediction , 2010 .

[153]  Guidel Ines,et al.  Expression profiling — best practices for data generation and interpretation in clinical trials , 2004, Nature Reviews Genetics.

[154]  Alan Cooper,et al.  The origin and phylogenetic relationships of the New Zealand ravens. , 2017, Molecular phylogenetics and evolution.

[155]  A. Peterson,et al.  No silver bullets in correlative ecological niche modelling: insights from testing among many potential algorithms for niche estimation , 2015 .

[156]  Jonathan M. Levine,et al.  Novel competitors shape species’ responses to climate change , 2015, Nature.

[157]  D. Warton,et al.  Equivalence of MAXENT and Poisson Point Process Models for Species Distribution Modeling in Ecology , 2013, Biometrics.

[158]  S. Ho,et al.  Ecological diversification of the Australian Coptotermes termites and the evolution of mound building , 2017 .

[159]  A. Peterson,et al.  Biodiversity informatics: managing and applying primary biodiversity data. , 2004, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[160]  Peter I. Miller,et al.  The importance of temporal resolution for niche modelling in dynamic marine environments , 2017 .

[161]  Aurélien Besnard,et al.  Field validation shows bias‐corrected pseudo‐absence selection is the best method for predictive species‐distribution modelling , 2014 .

[162]  A. Ellison,et al.  Should species distribution models account for spatial autocorrelation? A test of model projections across eight millennia of climate change , 2013 .

[163]  B A Wintle,et al.  Modeling species-habitat relationships with spatially autocorrelated observation data. , 2006, Ecological applications : a publication of the Ecological Society of America.

[164]  D. R. Cutler,et al.  Effects of sample survey design on the accuracy of classification tree models in species distribution models , 2006 .

[165]  A. Peterson,et al.  Species Distribution Modeling and Ecological Niche Modeling: Getting the Concepts Right , 2012 .

[166]  Antoine Guisan,et al.  Predictive habitat distribution models in ecology , 2000 .

[167]  M. Baker 1,500 scientists lift the lid on reproducibility , 2016, Nature.

[168]  Robert P. Anderson,et al.  Standards for distribution models in biodiversity assessments , 2019, Science Advances.

[169]  A. Peterson,et al.  The crucial role of the accessible area in ecological niche modeling and species distribution modeling , 2011 .

[170]  R. Hijmans,et al.  Cross-validation of species distribution models: removing spatial sorting bias and calibration with a null model. , 2012, Ecology.

[171]  Carsten Meyer,et al.  Multidimensional biases, gaps and uncertainties in global plant occurrence information. , 2016, Ecology letters.

[172]  Brendan A. Wintle,et al.  Is my species distribution model fit for purpose? Matching data and models to applications , 2015 .

[173]  Carsten F. Dormann,et al.  Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure , 2017 .

[174]  S. Mammola,et al.  Rapid poleward distributional shifts in the European cave‐dwelling Meta spiders under the influence of competition dynamics , 2017 .

[175]  J. Lobo,et al.  Historical bias in biodiversity inventories affects the observed environmental niche of the species , 2008 .

[176]  Ute Bradter,et al.  Identifying appropriate spatial scales of predictors in species distribution models with the random forest algorithm , 2013 .

[177]  Nico Eisenhauer,et al.  Genotypic variability enhances the reproducibility of an ecological study , 2016, bioRxiv.