Random subset feature selection for ecological niche models of wildfire activity in Western North America

Variable selection in ecological niche modelling can influence model projections to a degree comparable to variations in future climate scenarios. Consequently, it is important to select feature (variable) subsets for optimizing model performance and characterizing variability. We utilize a novel random subset feature selection algorithm (RSFSA) for niche modelling to select an ensemble of optimally sized feature subsets of limited correlation (|r| < 0.7) from 90 climatic, topographic and anthropogenic indices, generating wildfire activity models for western North America with higher performance. Monitoring Trends in Burn Severity and LANDFIRE wildfire data were used to develop thousands of MaxEnt, GLM and Glmnet models. The RSFSA-selected models performed better than random models, having higher accuracy (Area Under the Curve statistic; AUC), lower complexity (corrected Akaike Information Criterion; AICc), and, in some cases, lower overfitting (AUCdiff). The RSFSA-selected MaxEnt quadratic/hinge (β-regularization 2) feature models generally had higher AUC and lower AICc, outperforming other niche model parameterizations and methods. Feature subset ensembles of RSFSA-selected 15-variable MaxEnt quadratic/hinge models were used to characterize variability in projected areas of large wildfires for three burn severities under current, 2050, and 2070 climate scenarios. Expert screening of variables before RSFSA did not improve model performance. Widespread contemporary wildfire deficits and projected regional changes in wildfires highlight the need to manage fuel loads and restore natural fire regimes. The RSFSA is valuable for optimizing niche model performance and generating feature subset ensembles to characterize model variability across niche models of various feature subset sizes, modelling methods, and climate scenarios.

[1]  Rubén G. Mateo,et al.  Impact of model complexity on cross-temporal transferability in Maxent species distribution models: An assessment using paleobotanical data , 2015 .

[2]  William L. Baker,et al.  Managing fire-prone forests in the western United States , 2006 .

[3]  J. Greenberg,et al.  Spatial variability in wildfire probability across the western United States , 2012 .

[4]  Nikola Bogunovic,et al.  A review of feature selection methods with applications , 2015, 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[5]  Mark W. Schwartz,et al.  Using niche models with climate projections to inform conservation management decisions , 2012 .

[6]  Niels Raes,et al.  Minimum required number of specimen records to develop accurate species distribution models , 2016 .

[7]  Alistair M. S. Smith,et al.  Limitations and utilisation of Monitoring Trends in Burn Severity products for assessing wildfire severity in the USA , 2015 .

[8]  J. Lamarque,et al.  The HadGEM2-ES implementation of CMIP5 centennial simulations , 2011 .

[9]  M. Stambaugh,et al.  Future Fire Probability Modeling with Climate Change Data and Physical Chemistry , 2014 .

[10]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[11]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[12]  Carol Miller,et al.  How will climate change affect wildland fire severity in the western US? , 2016 .

[13]  Andrea Castelletti,et al.  An evaluation framework for input variable selection algorithms for environmental data-driven models , 2014, Environ. Model. Softw..

[14]  Robert P. Anderson,et al.  Making better Maxent models of species distributions: complexity, overfitting and evaluation , 2014 .

[15]  Nicholas W. Synes,et al.  Choice of predictor variables as a source of uncertainty in continental‐scale species distribution modelling under climate change , 2011 .

[16]  Sam Veloz,et al.  Spatially autocorrelated sampling falsely inflates measures of accuracy for presence‐only niche models , 2009 .

[17]  Terje Gobakken,et al.  How important are choice of model selection method and spatial autocorrelation of presence data for distribution modelling by MaxEnt , 2016 .

[18]  Okko Johannes Räsänen,et al.  Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech , 2013, INTERSPEECH.

[19]  Yuan Zhang,et al.  Monitoring Trends and Burn Severity (MTBS): Monitoring wildfire activity for the past quarter century using landsat data , 2012 .

[20]  Damaris Zurell,et al.  Collinearity: a review of methods to deal with it and a simulation study evaluating their performance , 2013 .

[21]  Adam B. Smith On evaluating species distribution models with random background sites in place of absences when test presences disproportionately sample suitable habitat , 2013 .

[22]  Kostas Kalabokidis,et al.  GIS analysis of physical and human impact on wildfire patterns. , 2002 .

[23]  K. Bollmann,et al.  Selecting from correlated climate variables: a major source of uncertainty for predicting species distributions under climate change , 2013 .

[24]  J. Altringham,et al.  Predicting Species Distributions Using Record Centre Data: Multi-Scale Modelling of Habitat Suitability for Bat Roosts , 2015, PloS one.

[25]  Mark New,et al.  Ensemble forecasting of species distributions. , 2007, Trends in ecology & evolution.

[26]  Peter Berck,et al.  Incorporating Anthropogenic Influences into Fire Probability Models: Effects of Human Activity and Climate Change on Fire Activity in California , 2016, PloS one.

[27]  Lisa M. Holsinger,et al.  Wildland fire deficit and surplus in the western United States, 1984–2012 , 2015 .

[28]  Antonio Trabucco,et al.  Trees and water: smallholder agroforestry on irrigated lands in Northern India , 2007 .

[29]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[30]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[31]  Melanie Hilario,et al.  Knowledge and Information Systems , 2007 .

[32]  B. Quayle,et al.  A Project for Monitoring Trends in Burn Severity , 2007 .

[33]  Sharmistha Swain,et al.  CMIP5 projected changes in spring and summer drought and wet conditions over North America , 2015, Climate Dynamics.

[34]  John Bell,et al.  A review of methods for the assessment of prediction errors in conservation presence/absence models , 1997, Environmental Conservation.

[35]  Antonio Trabucco,et al.  Climate change mitigation: a spatial analysis of global land suitability for Clean Development Mechanism afforestation and reforestation , 2008 .

[36]  Narasimhan K. Larkin,et al.  Climate change presents increased potential for very large fires in the contiguous United States , 2015 .

[37]  Guangyu Wang,et al.  Consensus Forecasting of Species Distributions: The Effects of Niche Model Performance and Niche Properties , 2015, PloS one.

[38]  Jane Elith,et al.  Maxent is not a presence–absence method: a comment on Thibaud et al. , 2014 .

[39]  G. Moisen,et al.  PresenceAbsence: An R Package for Presence Absence Analysis , 2008 .

[40]  Mike D. Flannigan,et al.  Anthropogenic influence on wildfire activity in Alberta, Canada , 2016 .

[41]  M. Oppenheimer,et al.  Climate change increases risk of plant invasion in the Eastern United States , 2009, Biological Invasions.

[42]  Charles H. Luce,et al.  Both topography and climate affected forest and woodland burn severity in two regions of the western US, 1984 to 2006 , 2011 .

[43]  Martin Volk,et al.  Input variable selection with a simple genetic algorithm for conceptual species distribution models: A case study of river pollution in Ecuador , 2017, Environ. Model. Softw..

[44]  Narkis S. Morales,et al.  MaxEnt’s parameter configuration and small samples: are we paying attention to recommendations? A systematic review , 2016, bioRxiv.

[45]  J. Randerson,et al.  A human-driven decline in global burned area , 2017, Science.

[46]  J. Abatzoglou,et al.  Modeling very large-fire occurrences over the continental United States from weather and climate forcing , 2014 .

[47]  Donald McKenzie,et al.  Climate change and the eco-hydrology of fire: Will area burned increase in a warming western USA? , 2017, Ecological applications : a publication of the Ecological Society of America.

[48]  G. Guillera‐Arroita,et al.  Satellite imagery as a single source of predictor variables for habitat suitability modelling: how Landsat can inform the conservation of a critically endangered lemur , 2010 .

[49]  N. Hengartner,et al.  Imprint of the Atlantic multi-decadal oscillation and Pacific decadal oscillation on southwestern US climate: past, present, and future , 2014, Climate Dynamics.

[50]  Antoine Guisan,et al.  Measuring the relative effect of factors affecting species distribution model predictions , 2014 .

[51]  E. Natasha Stavros,et al.  Regional projections of the likelihood of very large wildland fires under a changing climate in the contiguous Western United States , 2014, Climatic Change.

[52]  Robert A. Boria,et al.  Spatial filtering to reduce sampling bias can improve the performance of ecological niche models , 2014 .

[53]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[54]  K. Verdin,et al.  New Global Hydrography Derived From Spaceborne Elevation Data , 2008 .

[55]  Trevor Hastie,et al.  A statistical explanation of MaxEnt for ecologists , 2011 .

[56]  Rebecca M. B. Harris,et al.  To Be Or Not to Be? Variable selection can change the projected fate of a threatened species under future climate , 2013 .

[57]  Robert P. Anderson,et al.  Species-specific tuning increases robustness to sampling bias in models of species distributions: An implementation with Maxent , 2011 .

[58]  J. F. Howell,et al.  Pairwise Multiple Comparison Procedures with Unequal N’s and/or Variances: A Monte Carlo Study , 1976 .

[59]  R. Halvorsen A strict maximum likelihood explanation of MaxEnt, and some implications for distribution modelling , 2013 .

[60]  Monica G. Turner,et al.  Adapt to more wildfire in western North American forests as climate changes , 2017, Proceedings of the National Academy of Sciences.

[61]  J. M. Fong,et al.  Continentality: A basic climatic parameter re‐examined , 1992 .

[62]  M. White,et al.  Selecting thresholds for the prediction of species occurrence with presence‐only data , 2013 .

[63]  F. Jiguet,et al.  Selecting pseudo‐absences for species distribution models: how, where and how many? , 2012 .

[64]  Alberto Jiménez-Valverde,et al.  Not as good as they seem: the importance of concepts in species distribution modelling , 2008 .

[65]  William M. Fonta,et al.  Using species distribution models to optimize vector control in the framework of the tsetse eradication campaign in Senegal , 2014, Proceedings of the National Academy of Sciences.

[66]  Zhihua Liu,et al.  Climatic and Landscape Influences on Fire Regimes from 1984 to 2010 in the Western United States , 2015, PloS one.

[67]  M. Moritz,et al.  Environmental controls on the distribution of wildfire at multiple spatial scales , 2009 .

[68]  L. Beaumont,et al.  Predicting species distributions: use of climatic parameters in BIOCLIM and its impact on predictions of species’ current and future distributions , 2005 .

[69]  Matthew J. Smith,et al.  Protected areas network is not adequate to protect a critically endangered East Africa Chelonian: Modelling distribution of pancake tortoise, Malacochersus tornieri under current and future climates , 2013, bioRxiv.

[70]  Zhihua Liu,et al.  Direct and indirect effects of climate change on projected future fire regimes in the western United States. , 2016, The Science of the total environment.

[71]  Mao Ning Tuanmu,et al.  A global 1‐km consensus land‐cover product for biodiversity and ecosystem modelling , 2014 .

[72]  H. Bozdogan Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions , 1987 .

[73]  Rebecca M. B. Harris,et al.  Improving the Use of Species Distribution Models in Conservation Planning and Management under Climate Change , 2014, PloS one.

[74]  Sabrina Mazzoni,et al.  Opportunities for improved distribution modelling practice via a strict maximum likelihood interpretation of MaxEnt , 2015 .

[75]  Dan L Warren,et al.  Ecological niche modeling in Maxent: the importance of model complexity and the performance of model selection criteria. , 2011, Ecological applications : a publication of the Ecological Society of America.

[76]  J. L. Parra,et al.  Very high resolution interpolated climate surfaces for global land areas , 2005 .

[77]  David J. Stracuzzi Randomized Feature Selection , 2007 .

[78]  Bernard De Baets,et al.  Knowledge-based versus data-driven fuzzy habitat suitability models for river management , 2009, Environ. Model. Softw..

[79]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[80]  Robert A. Boria,et al.  ENMeval: An R package for conducting spatially independent evaluations and estimating optimal model complexity for Maxent ecological niche models , 2014 .

[81]  G. Hoarau,et al.  Improving Transferability of Introduced Species’ Distribution Models: New Tools to Forecast the Spread of a Highly Invasive Seaweed , 2013, PloS one.

[82]  Miroslav Dudík,et al.  Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation , 2008 .

[83]  Darren C. J. Yeo,et al.  Novel methods to select environmental variables in MaxEnt: A case study using invasive crayfish , 2016 .

[84]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .