The MIAmaxent R package: Variable transformation and model selection for species distribution models

Abstract The widely used “Maxent” software for modeling species distributions from presence‐only data (Phillips et al., Ecological Modelling, 190, 2006, 231) tends to produce models with high‐predictive performance but low‐ecological interpretability, and implications of Maxent's statistical approach to variable transformation, model fitting, and model selection remain underappreciated. In particular, Maxent's approach to model selection through lasso regularization has been shown to give less parsimonious distribution models—that is, models which are more complex but not necessarily predictively better—than subset selection. In this paper, we introduce the MIAmaxent R package, which provides a statistical approach to modeling species distributions similar to Maxent's, but with subset selection instead of lasso regularization. The simpler models typically produced by subset selection are ecologically more interpretable, and making distribution models more grounded in ecological theory is a fundamental motivation for using MIAmaxent. To that end, the package executes variable transformation based on expected occurrence–environment relationships and contains tools for exploring data and interrogating models in light of knowledge of the modeled system. Additionally, MIAmaxent implements two different kinds of model fitting: maximum entropy fitting for presence‐only data and logistic regression (GLM) for presence–absence data. Unlike Maxent, MIAmaxent decouples variable transformation, model fitting, and model selection, which facilitates methodological comparisons and gives the modeler greater flexibility when choosing a statistical approach to a given distribution modeling problem.

[1]  Luis E Escobar,et al.  Ecological niche modeling re‐examined: A case study with the Darwin's fox , 2018, Ecology and evolution.

[2]  Dan L Warren,et al.  Ecological niche modeling in Maxent: the importance of model complexity and the performance of model selection criteria. , 2011, Ecological applications : a publication of the Ecological Society of America.

[3]  Sunil Kumar,et al.  Field validation of an invasive species Maxent model , 2016, Ecol. Informatics.

[4]  J. Dahlgren,et al.  Alternative regression methods are not considered in Murtaugh (2009) or by ecologists in general. , 2010, Ecology letters.

[5]  Rune Halvorsen,et al.  A gradient analytic perspective on distribution modelling , 2012 .

[6]  Miroslav Dudík,et al.  Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation , 2008 .

[7]  John Bell,et al.  A review of methods for the assessment of prediction errors in conservation presence/absence models , 1997, Environmental Conservation.

[8]  Kenton O'Hara,et al.  Scientists and software – surveying the species distribution modelling community , 2015 .

[9]  T. Hastie,et al.  Finite-Sample Equivalence in Statistical Models for Presence-Only Data. , 2012, The annals of applied statistics.

[10]  Terje Gobakken,et al.  How important are choice of model selection method and spatial autocorrelation of presence data for distribution modelling by MaxEnt , 2016 .

[11]  M. Austin Spatial prediction of species distribution: an interface between ecological theory and statistical modelling , 2002 .

[12]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[13]  D. Warton,et al.  Correction note: Poisson point process models solve the “pseudo-absence problem” for presence-only data in ecology , 2010, 1011.3319.

[14]  Steven J. Phillips,et al.  Point process models for presence‐only analysis , 2015 .

[15]  A. Townsend Peterson,et al.  Constraints on interpretation of ecological niche models by limited environmental ranges on calibration areas , 2012 .

[16]  Trevor Hastie,et al.  A statistical explanation of MaxEnt for ecologists , 2011 .

[17]  Sabrina Mazzoni,et al.  Distribution modelling by MaxEnt: from black box to flexible toolbox , 2016 .

[18]  J. Elith,et al.  Species Distribution Models: Ecological Explanation and Prediction Across Space and Time , 2009 .

[19]  Eve McDonald-Madden,et al.  Predicting species distributions for conservation decisions , 2013, Ecology letters.

[20]  Antoine Guisan,et al.  Predictive habitat distribution models in ecology , 2000 .

[21]  Dan L Warren,et al.  In defense of 'niche modeling'. , 2012, Trends in ecology & evolution.

[22]  David M. Bell,et al.  On the dangers of model complexity without ecological justification in species distribution modeling , 2016 .

[23]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[24]  Jane Elith,et al.  Maxent is not a presence–absence method: a comment on Thibaud et al. , 2014 .

[25]  Robert P. Anderson,et al.  Standards for distribution models in biodiversity assessments , 2019, Science Advances.

[26]  J. Andrew Royle,et al.  Presence‐only modelling using MAXENT: when can we trust the inferences? , 2013 .

[27]  Matthew E. Aiello-Lammens,et al.  Improving niche and range estimates with Maxent and point process models by integrating spatially explicit information , 2016 .

[28]  J. Elith,et al.  Do they? How do they? WHY do they differ? On finding reasons for differing performances of species distribution models , 2009 .

[29]  Knut Rydgren,et al.  Species response curves along environmental gradients. A case study from SE Norwegian swamp forests , 2003 .

[30]  A. Peterson,et al.  No silver bullets in correlative ecological niche modelling: insights from testing among many potential algorithms for niche estimation , 2015 .

[31]  Serena R Wright,et al.  Using individual tracking data to validate the predictions of species distribution models , 2016 .

[32]  Trevor Hastie,et al.  Inference from presence-only data; the ongoing controversy. , 2013, Ecography.

[33]  Sabrina Mazzoni,et al.  Opportunities for improved distribution modelling practice via a strict maximum likelihood interpretation of MaxEnt , 2015 .

[34]  D. Warton,et al.  Equivalence of MAXENT and Poisson Point Process Models for Species Distribution Modeling in Ecology , 2013, Biometrics.

[35]  Damaris Zurell,et al.  Collinearity: a review of methods to deal with it and a simulation study evaluating their performance , 2013 .

[36]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[37]  Julian D. Olden,et al.  Assessing transferability of ecological models: an underappreciated aspect of statistical validation , 2012 .

[38]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[39]  B. McGill,et al.  Testing the predictive performance of distribution models , 2013 .

[40]  H. B. Shaffer,et al.  Field validation supports novel niche modeling strategies in a cryptic endangered amphibian , 2014 .

[41]  Matthew J. Smith,et al.  Protected areas network is not adequate to protect a critically endangered East Africa Chelonian: Modelling distribution of pancake tortoise, Malacochersus tornieri under current and future climates , 2013, bioRxiv.

[42]  Miguel B. Araújo,et al.  sdm: a reproducible and extensible R platform for species distribution modelling , 2016 .

[43]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Steven J. Phillips,et al.  The art of modelling range‐shifting species , 2010 .

[45]  Jane Elith,et al.  What do we gain from simplicity versus complexity in species distribution models , 2014 .

[46]  Robert P. Anderson,et al.  Toward ecologically realistic predictions of species distributions: A cross‐time example from tropical montane cloud forests , 2018, Global change biology.

[47]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[48]  Sabrina Mazzoni,et al.  MIAT: Modular R-wrappers for flexible implementation of MaxEnt distribution modelling , 2015, Ecol. Informatics.

[49]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[50]  Trevor Hastie,et al.  Making better biogeographical predictions of species’ distributions , 2006 .

[51]  Weiqi Luo,et al.  Analysing ecological data , 2009 .

[52]  Robert A. Boria,et al.  ENMeval: An R package for conducting spatially independent evaluations and estimating optimal model complexity for Maxent ecological niche models , 2014 .

[53]  Paul A Murtaugh,et al.  Performance of several variable-selection methods applied to real ecological data. , 2009, Ecology letters.

[54]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[55]  Antoine Guisan,et al.  Are niche-based species distribution models transferable in space? , 2006 .

[56]  David J. Gavaghan,et al.  The zoon r package for reproducible and shareable species distribution modelling , 2017 .

[57]  M. Araújo,et al.  Validation of species–climate impact models under climate change , 2005 .

[58]  Robert P. Anderson,et al.  Opening the black box: an open-source release of Maxent , 2017 .

[59]  J. Fieberg,et al.  Comparative interpretation of count, presence–absence and point methods for species distribution models , 2012 .

[60]  R. Halvorsen,et al.  Combining genetic analyses of archived specimens with distribution modelling to explain the anomalous distribution of the rare lichen Staurolemma omphalarioides: long‐distance dispersal or vicariance? , 2014 .

[61]  Sam Veloz,et al.  Spatially autocorrelated sampling falsely inflates measures of accuracy for presence‐only niche models , 2009 .

[62]  Richard Fox,et al.  Direct and indirect effects of climate and habitat factors on butterfly diversity. , 2007, Ecology.

[63]  Steven J. Phillips,et al.  Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. , 2009, Ecological applications : a publication of the Ecological Society of America.

[64]  V. I. Gusarov,et al.  Sampling bias in presence-only data used for species distribution modelling: theory and methods for detecting sample bias and its effects on models , 2018, Sommerfeltia.

[65]  M. Araújo,et al.  Five (or so) challenges for species distribution modelling , 2006 .

[66]  T. Økland,et al.  PLANT SPECIES COMPOSITION OF BOREAL SPRUCE SWAMP FORESTS: CLOSED DOORS AND WINDOWS OF OPPORTUNITY , 2003 .

[67]  Rubén G. Mateo,et al.  Impact of model complexity on cross-temporal transferability in Maxent species distribution models: An assessment using paleobotanical data , 2015 .

[68]  Mark S. Boyce,et al.  Modelling distribution and abundance with presence‐only data , 2006 .

[69]  A. Gastón,et al.  Modelling species distributions with penalised logistic regressions: A comparison with maximum entropy models , 2011 .

[70]  Sunil Kumar,et al.  Caveats for correlative species distribution modeling , 2015, Ecol. Informatics.

[71]  R. Halvorsen,et al.  A fine‐grained spatial prediction model for the red‐listed vascular plant Scorzonera humilis , 2011 .

[72]  Boris Schr,et al.  Constrain to perform: Regularization of habitat models , 2006 .

[73]  R. Halvorsen A strict maximum likelihood explanation of MaxEnt, and some implications for distribution modelling , 2013 .

[74]  M. Austin Species distribution models and ecological theory: A critical assessment and some possible new approaches , 2007 .

[75]  Matthew E. Aiello-Lammens,et al.  spThin: an R package for spatial thinning of species occurrence records for use in ecological niche models , 2015 .

[76]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[77]  Miroslav Dudík,et al.  A maximum entropy approach to species distribution modeling , 2004, ICML.

[78]  Narkis S. Morales,et al.  MaxEnt’s parameter configuration and small samples: are we paying attention to recommendations? A systematic review , 2016, bioRxiv.

[79]  Ryan E Wiegand,et al.  Performance of using multiple stepwise algorithms for variable selection , 2010, Statistics in medicine.

[80]  Robert P. Anderson,et al.  Making better Maxent models of species distributions: complexity, overfitting and evaluation , 2014 .