Opportunities for improved distribution modelling practice via a strict maximum likelihood interpretation of MaxEnt

Maximum entropy (MaxEnt) modelling, as implemented in the Maxent software, has rapidly become one of the most popular methods for distribution modelling. Originally, MaxEnt was described as a machine-learning method. More recently, it has been explained from principles of Bayesian estimation. MaxEnt offers numerous options (variants of the method) and settings (tuning of parameters) to the users. A widespread practice of accepting the Maxent software's default options and settings has been established, most likely because of ecologists’ lack of familiarity with machine-learning and Bayesian statistical concepts and the ease by which the default models are obtained in Maxent. However, these defaults have been shown, in many cases, to be suboptimal and exploration of alternatives has repeatedly been called for. In this paper, we derive MaxEnt from strict maximum likelihood principles, and point out parallels between MaxEnt and standard modelling tools like generalised linear models (GLM). Furthermore, we describe several new options opened by this new derivation of MaxEnt, which may improve MaxEnt practice. The most important of these is the option for selecting variables by subset selection methods instead of the l1-regularisation method, which currently is the Maxent software default. Other new options include: incorporation of new transformations of explanatory variables and user control of the transformation process; improved variable contribution measures and options for variation partitioning; and improved output prediction formats. The new options are exemplified for a data set for the plant species Scorzonera humilis in SE Norway, which was analysed by the standard MaxEnt procedure in a previously published paper. We recommend that thorough comparisons between the proposed alternative options and default procedures and variants thereof be carried out.

[1]  Petr Benda,et al.  Phylogeography and predicted distribution of African-Arabian and Malagasy populations of giant mastiff bats, Otomops spp. (Chiroptera: Molossidae) , 2008 .

[2]  Robert P. Anderson,et al.  Ecological Niches and Geographic Distributions , 2011 .

[3]  Miroslav Dudík,et al.  Maximum Entropy Density Estimation with Generalized Regularization and an Application to Species Distribution Modeling , 2007, J. Mach. Learn. Res..

[4]  Matthew J. Smith,et al.  Protected areas network is not adequate to protect a critically endangered East Africa Chelonian: Modelling distribution of pancake tortoise, Malacochersus tornieri under current and future climates , 2013, bioRxiv.

[5]  Antoine Guisan,et al.  Predictive habitat distribution models in ecology , 2000 .

[6]  P. Legendre,et al.  Partialling out the spatial component of ecological variation , 1992 .

[7]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[8]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[9]  C. Elkan,et al.  Can we model the probability of presence of species without absence data , 2011 .

[10]  R. H. Økland On the variation explained by ordination and constrained ordination axes , 1999 .

[11]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[12]  Jane Elith,et al.  On estimating probability of presence from use-availability or presence-background data. , 2013, Ecology.

[13]  Steven J. Phillips,et al.  Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. , 2009, Ecological applications : a publication of the Ecological Society of America.

[14]  R. Halvorsen,et al.  Modelling and predicting fungal distribution patterns using herbarium data , 2008 .

[15]  D. R. Cutler,et al.  MODEL-BASED STRATIFICATIONS FOR ENHANCING THE DETECTION OF RARE ECOLOGICAL EVENTS , 2005 .

[16]  Dan L Warren,et al.  Ecological niche modeling in Maxent: the importance of model complexity and the performance of model selection criteria. , 2011, Ecological applications : a publication of the Ecological Society of America.

[17]  N. Barré,et al.  Using ecological niche models to infer the distribution and population size of parakeets in New Caledonia , 2013 .

[18]  Rune Halvorsen,et al.  A gradient analytic perspective on distribution modelling , 2012 .

[19]  Miroslav Dudík,et al.  Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation , 2008 .

[20]  Økland Rune Halvorsen Partitioning the variation in a plot-by-species data matrix that is related to n sets of explanatory variables , 2003 .

[21]  T. Reader,et al.  Testing the accuracy of species distribution models using species records from a new field survey , 2010 .

[22]  Joachim M. Buhmann,et al.  Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  J. Andrew Royle,et al.  Likelihood analysis of species occurrence probability from presence‐only data for modelling species distributions , 2012, Methods in Ecology and Evolution.

[24]  M. Oppenheimer,et al.  Climate change increases risk of plant invasion in the Eastern United States , 2009, Biological Invasions.

[25]  Miroslav Dudík,et al.  A maximum entropy approach to species distribution modeling , 2004, ICML.

[26]  Sam Veloz,et al.  Spatially autocorrelated sampling falsely inflates measures of accuracy for presence‐only niche models , 2009 .

[27]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[28]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[29]  Miroslav Dudík,et al.  Generative and Discriminative Learning with Unknown Labeling Bias , 2008, NIPS.

[30]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Alberto Jiménez-Valverde,et al.  The uncertain nature of absences and their importance in species distribution modelling , 2010 .

[32]  L. Chisholm,et al.  A simple post-hoc method to add spatial context to predictive species distribution models , 2012 .

[33]  P. Legendre,et al.  Forward selection of explanatory variables. , 2008, Ecology.

[34]  Robert P. Anderson,et al.  Species-specific tuning increases robustness to sampling bias in models of species distributions: An implementation with Maxent , 2011 .

[35]  Steven J. Phillips Inferring prevalence from presence‐only data: a response to ‘Can we model the probability of presence of species without absence data?’ , 2012 .

[36]  Simon Ferrier,et al.  Evaluating the predictive performance of habitat models developed using logistic regression , 2000 .

[37]  Carsten F. Dormann,et al.  Less than eight (and a half) misconceptions of spatial analysis , 2012 .

[38]  Richard E. Glor,et al.  ENMTools: a toolbox for comparative studies of environmental niche models , 2010 .

[39]  R. H. Myers Generalized Linear Models: With Applications in Engineering and the Sciences , 2001 .

[40]  A. Hirzel,et al.  Evaluating the ability of habitat suitability models to predict species presences , 2006 .

[41]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[42]  Trevor Hastie,et al.  A statistical explanation of MaxEnt for ecologists , 2011 .

[43]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[44]  I. Moore,et al.  Digital terrain modelling: A review of hydrological, geomorphological, and biological applications , 1991 .

[45]  R. Halvorsen,et al.  Impact of Scale and Quality of Digital Terrain Models on Predictability of Seabed Terrain Types , 2013 .

[46]  John Bell,et al.  A review of methods for the assessment of prediction errors in conservation presence/absence models , 1997, Environmental Conservation.

[47]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[48]  R. Halvorsen,et al.  A fine‐grained spatial prediction model for the red‐listed vascular plant Scorzonera humilis , 2011 .

[49]  R. Halvorsen A strict maximum likelihood explanation of MaxEnt, and some implications for distribution modelling , 2013 .

[50]  M. Austin Species distribution models and ecological theory: A critical assessment and some possible new approaches , 2007 .

[51]  D. R. Cutler,et al.  Effects of sample survey design on the accuracy of classification tree models in species distribution models , 2006 .

[52]  D. Warton,et al.  Equivalence of MAXENT and Poisson Point Process Models for Species Distribution Modeling in Ecology , 2013, Biometrics.

[53]  J. Andrew Royle,et al.  Presence‐only modelling using MAXENT: when can we trust the inferences? , 2013 .

[54]  C. A. Howell,et al.  Niches, models, and climate change: Assessing the assumptions and uncertainties , 2009, Proceedings of the National Academy of Sciences.

[55]  B. Reineking,et al.  Constrain to perform: Regularization of habitat models , 2006 .