A statistical explanation of MaxEnt for ecologists

MaxEnt is a program for modelling species distributions from presence-only species records. This paper is written for ecologists and describes the MaxEnt model from a statistical perspective, making explicit links between the structure of the model, decisions required in producing a modelled distribution, and knowledge about the species and the data that might affect those decisions. To begin we discuss the characteristics of presence-only data, highlighting implications for modelling distributions. We particularly focus on the problems of sample bias and lack of information on species prevalence. The keystone of the paper is a new statistical explanation of MaxEnt which shows that the model minimizes the relative entropy between two probability densities (one estimated from the presence data and one, from the landscape) defined in covariate space. For many users, this viewpoint is likely to be a more accessible way to understand the model than previous ones that rely on machine learning concepts. We then step through a detailed explanation of MaxEnt describing key components (e.g. covariates and features, and definition of the landscape extent), the mechanics of model fitting (e.g. feature selection, constraints and regularization) and outputs. Using case studies for a Banksia species native to south-west Australia and a riverine fish, we fit models and interpret them, exploring why certain choices affect the result and what this means. The fish example illustrates use of the model with vector data for linear river segments rather than raster (gridded) data. Appropriate treatments for survey bias, unprojected data, locally restricted species, and predicting to environments outside the range of the training data are demonstrated, and new capabilities discussed. Online appendices include additional details of the model and the mathematical links between previous explanations and this one, example code and data, and further information on the case studies.

[1]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Steven J. Phillips,et al.  The art of modelling range‐shifting species , 2010 .

[3]  J. Elith,et al.  Assessing the impacts of climate change and land transformation on Banksia in the South West Australian Floristic Region , 2010 .

[4]  Kalle Ruokolainen,et al.  Analysing botanical collecting effort in Amazonia and correcting for it in species range estimation , 2007 .

[5]  Alberto Jiménez-Valverde,et al.  The uncertain nature of absences and their importance in species distribution modelling , 2010 .

[6]  D. Ward Modelling the potential geographic distribution of invasive ant species in New Zealand , 2007, Biological Invasions.

[7]  L. Tyberghein,et al.  Macroecology meets macroevolution: evolutionary niche dynamics in the seaweed Halimeda , 2009 .

[8]  Jennifer A. Miller,et al.  Mapping Species Distributions: Spatial Inference and Prediction , 2010 .

[9]  Rosa M. Chefaoui,et al.  Assessing the effects of pseudo-absences on predictive distribution model performance , 2008 .

[10]  S. Cairns,et al.  Seven forms of rarity and their frequency in the flora of the British Isles , 1986 .

[11]  Petr Benda,et al.  Phylogeography and predicted distribution of African-Arabian and Malagasy populations of giant mastiff bats, Otomops spp. (Chiroptera: Molossidae) , 2008 .

[12]  Joachim M. Buhmann,et al.  Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  D. Tittensor,et al.  Predicting global habitat suitability for stony corals on seamounts , 2009 .

[14]  Steven J. Phillips,et al.  Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. , 2009, Ecological applications : a publication of the Ecological Society of America.

[15]  S. Cherry,et al.  USE AND INTERPRETATION OF LOGISTIC REGRESSION IN HABITAT-SELECTION STUDIES , 2004 .

[16]  Yun-sheng Wang,et al.  The Potential Geographic Distribution of Radopholus similis in China , 2007 .

[17]  Craig Moritz,et al.  Historical climate modelling predicts patterns of current biodiversity in the Brazilian Atlantic forest , 2008 .

[18]  Kristen Averyt,et al.  Climate change 2007: Synthesis Report. Contribution of Working Group I, II and III to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change. Summary for Policymakers. , 2007 .

[19]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[20]  A. Peterson,et al.  INTERPRETATION OF MODELS OF FUNDAMENTAL ECOLOGICAL NICHES AND SPECIES' DISTRIBUTIONAL AREAS , 2005 .

[21]  Willem A. Landman,et al.  Climate change 2007 : the physical science basis, S. Solomon, D. Qin, M. Manning, M. Marquis, K. Averyt, M.M.B. Tignor, H. LeRoy Miller, Jr. and Z. Chen (Eds.) : book review , 2010 .

[22]  Catherine H. Graham,et al.  A comparison of methods for mapping species ranges and species richness , 2006 .

[23]  M. Pfenninger,et al.  Inferring the past to predict the future: climate modelling predictions and phylogeography for the freshwater gastropod Radix balthica (Pulmonata, Basommatophora) , 2009, Molecular ecology.

[24]  R. Halvorsen,et al.  Modelling and predicting fungal distribution patterns using herbarium data , 2008 .

[25]  Pedro X. Astudillo,et al.  Distribution, ecology and conservation of an endangered Andean hummingbird: the Violet-throated Metaltail (Metallura baroni) , 2009, Bird Conservation International.

[26]  Darryl I. MacKenzie,et al.  WAS IT THERE? DEALING WITH IMPERFECT DETECTION FOR SPECIES PRESENCE/ABSENCE DATA † , 2005 .

[27]  Jane Elith,et al.  Error and uncertainty in habitat models , 2006 .

[28]  C. Yesson,et al.  A phyloclimatic study of Cyclamen , 2006, BMC Evolutionary Biology.

[29]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[30]  Alberto Jiménez-Valverde,et al.  Not as good as they seem: the importance of concepts in species distribution modelling , 2008 .

[31]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[32]  Miroslav Dudík,et al.  Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation , 2008 .

[33]  H. Akaike A new look at the statistical model identification , 1974 .

[34]  J. Kerr,et al.  Historically calibrated predictions of butterfly species' range shift using global change as a pseudo-experiment. , 2009, Ecology.

[35]  M. Austin Spatial prediction of species distribution: an interface between ecological theory and statistical modelling , 2002 .

[36]  J. Lobo,et al.  An evaluation of methods for modelling distribution of Patagonian insects , 2009 .

[37]  C. Dormann Promising the future? Global change projections of species distributions , 2007 .

[38]  A. Márcia Barbosa,et al.  Obtaining Environmental Favourability Functions from Logistic Regression , 2006, Environmental and Ecological Statistics.

[39]  C. Graham,et al.  Selecting pseudo-absence data for presence-only distribution modeling: How far should you stray from what you know? , 2009 .

[40]  Brendan A. Wintle,et al.  PRECISION AND BIAS OF METHODS FOR ESTIMATING POINT SURVEY DETECTION PROBABILITIES , 2004 .

[41]  A. Hirzel,et al.  Habitat suitability modelling and niche theory , 2008 .

[42]  J. Elith,et al.  Species Distribution Models: Ecological Explanation and Prediction Across Space and Time , 2009 .

[43]  Tim Newbold,et al.  Applications and limitations of museum data for conservation and ecology, with particular attention to species distribution models , 2010 .

[44]  J. Leathwick Are New Zealand's Nothofagus species in equilibrium with their environment? , 1998 .

[45]  J. Svenning,et al.  Limited filling of the potential range in European tree species , 2004 .

[46]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[47]  Darryl I. MacKenzie,et al.  Designing occupancy studies: general advice and allocating survey effort , 2005 .

[48]  T. Hastie,et al.  Presence‐Only Data and the EM Algorithm , 2009, Biometrics.

[49]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[50]  Carolina Tovar,et al.  Using Spatial Models to Predict Areas of Endemism and Gaps in the Protection of Andean Slope Birds , 2009 .

[51]  Steven Bachman,et al.  Plant Diversity Hotspots in the Atlantic Coastal Forests of Brazil , 2009, Conservation biology : the journal of the Society for Conservation Biology.

[52]  Mark S. Boyce,et al.  Modelling distribution and abundance with presence‐only data , 2006 .

[53]  M. Schwartz,et al.  Using species distribution models to predict new occurrences for rare plants , 2009 .

[54]  P. Ferreras,et al.  Spatial ecology of the European wildcat in a Mediterranean ecosystem: dealing with small radio‐tracking datasets in species conservation , 2009 .

[55]  Jorge Soberón,et al.  Niches and distributional areas: Concepts, methods, and assumptions , 2009, Proceedings of the National Academy of Sciences.

[56]  Miguel Nakamura Savoy Predicting species distributions from small numbers of occurrence records: a test case using cryptic geckos in Madagascar , 2007 .

[57]  Miroslav Dudík,et al.  Correcting sample selection bias in maximum entropy density estimation , 2005, NIPS.