Variable selection and accurate predictions in habitat modelling: a shrinkage approach

Habitat modelling is increasingly relevant in biodiversity and conservation studies. A typical application is to predict potential zones of specific conservation interest. With many environmental covariates, a large number of models can be investigated but multi-model inference may become impractical. Shrinkage regression overcomes this issue by dealing with the identification and accurate estimation of effect size for prediction. In a Bayesian framework we investigated the use of a shrinkage prior, the Horseshoe, for variable selection in spatial generalized linear models (GLM). As study cases, we considered 5 datasets on small pelagic fish abundance in the Gulf of Lion (Mediterranean Sea, France) and 9 environmental inputs. We compared the predictive performances of a simple kriging model, a full spatial GLM model with independent normal priors for regression coefficients, a full spatial GLM model with a Horseshoe prior for regression coefficients and 2 zero-inflated models (spatial and non-spatial) with a Horseshoe prior. Predictive performances were evaluated by cross-validation on a hold-out subset of the data: models with a Horseshoe prior performed best, and the full model with independent normal priors worst. With an increasing number of inputs, extrapolation quickly became pervasive as we tried to predict from novel combinations of covariate values. By shrinking regression coefficients with a Horseshoe prior, only one model needed to be fitted to the data in order to obtain reasonable and accurate predictions, including extrapolations.

[1]  Hugh P Possingham,et al.  Zero tolerance ecology: improving ecological inference by modelling the source of zero observations. , 2005, Ecology letters.

[2]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[3]  Noël Diner,et al.  MOVIES-B: an acoustic detection description software. Application to shoal species' classification , 1993 .

[4]  Gary King,et al.  WhatIF: R Software for Evaluating Counterfactuals , 2005 .

[5]  Francis K. C. Hui,et al.  So Many Variables: Joint Modeling in Community Ecology. , 2015, Trends in ecology & evolution.

[6]  Jarno Vanhatalo,et al.  Bayesian spatial multispecies modelling to assess pelagic fish stocks from acoustic- and trawl-survey data , 2012 .

[7]  C. Ricotta,et al.  Accounting for uncertainty when mapping species distributions: The need for maps of ignorance , 2011 .

[8]  K. Burnham,et al.  Model selection: An integral part of inference , 1997 .

[9]  Ben Collen,et al.  Complexity is costly: a meta‐analysis of parametric and non‐parametric methods for short‐term population forecasting , 2014 .

[10]  Galit Shmueli,et al.  To Explain or To Predict? , 2010 .

[11]  Francis Tuerlinckx,et al.  Type S error rates for classical and Bayesian single and multiple comparison procedures , 2000, Comput. Stat..

[12]  Alexandra M. Schmidt,et al.  Investigating the sensitivity of Gaussian processes to the choice of their correlation function and prior specifications , 2008 .

[13]  C. Wikle Hierarchical Models in Environmental Science , 2003 .

[14]  Robin M. Hogarth,et al.  When Simple Is Hard to Accept , 2012 .

[15]  B. McGill,et al.  Testing the predictive performance of distribution models , 2013 .

[16]  Michael A Babyak,et al.  What You See May Not Be What You Get: A Brief, Nontechnical Introduction to Overfitting in Regression-Type Models , 2004, Psychosomatic medicine.

[17]  Cosma Rohilla Shalizi,et al.  Philosophy and the practice of Bayesian statistics. , 2010, The British journal of mathematical and statistical psychology.

[18]  Ian Phillip Vaughan,et al.  The continuing challenges of testing species distribution models , 2005 .

[19]  B. Reineking,et al.  Constrain to perform: Regularization of habitat models , 2006 .

[20]  Bradley P. Carlin,et al.  Markov Chain Monte Carlo conver-gence diagnostics: a comparative review , 1996 .

[21]  Mevin B. Hooten,et al.  Hierarchical spatial models for predicting pygmy rabbit distribution and relative abundance , 2010 .

[22]  R. M. Nally Regression and model-building in conservation biology, biogeography and ecology: The distinction between – and reconciliation of – ‘predictive’ and ‘explanatory’ models , 2000, Biodiversity & Conservation.

[23]  Gary King,et al.  When Can History Be Our Guide? The Pitfalls of Counterfactual Inference , 2007 .

[24]  R. G. Davies,et al.  Methods to account for spatial autocorrelation in the analysis of species distributional data : a review , 2007 .

[25]  J. Miquel,et al.  Characterizing the potential habitat of European anchovy Engraulis encrasicolus in the Mediterranean Sea, at different life stages , 2013 .

[26]  Jonathan M. Nichols,et al.  Studying Biodiversity: Is a New Paradigm Really Needed? , 2012 .

[27]  Edward E. Leamer,et al.  The Context Matters: Comment on Jerome H. Friedman, “Fast sparse regression and classification” , 2012 .

[28]  J. Griffin,et al.  Some Priors for Sparse Regression Modelling , 2013 .

[29]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[30]  Alan E. Gelfand,et al.  Model choice: A minimum posterior predictive loss approach , 1998, AISTATS.

[31]  C. Dormann Effects of incorporating spatial autocorrelation into the analysis of species distribution data , 2007 .

[32]  P. Kyle Stanford,et al.  Exceeding Our Grasp , 2006 .

[33]  Brian J. McGill,et al.  Can niche-based distribution models outperform spatial interpolation? , 2007 .

[34]  J. Fromentin,et al.  Rapid changes in growth, condition, size and age of small pelagic fish in the Mediterranean , 2014 .

[35]  J. Dahlgren,et al.  Alternative regression methods are not considered in Murtaugh (2009) or by ecologists in general. , 2010, Ecology letters.

[36]  J. Elith,et al.  Species Distribution Models: Ecological Explanation and Prediction Across Space and Time , 2009 .

[37]  Jennifer A. Hoeting,et al.  Bayesian Multimodel Inference for Geostatistical Regression Models , 2011, PloS one.

[38]  Nicolas Bez,et al.  Spatial Structure and Distribution of Small Pelagic Fish in the Northwestern Mediterranean Sea , 2014, PloS one.

[39]  J. Aguirre‐Gutiérrez,et al.  Ecological Effects of the Invasive Giant Madagascar Day Gecko on Endemic Mauritian Geckos: Applications of Binomial-Mixture and Species Distribution Models , 2014, PloS one.

[40]  William A. Link,et al.  Extremes in Ecology: Avoiding the Misleading Effects of Sampling Variation in Summary Analyses , 1996 .

[41]  Alberto García,et al.  Small pelagic fish in the NW Mediterranean Sea: An ecological review , 2007 .

[42]  M. Giannoulaki,et al.  Habitat suitability modelling for sardine Sardina pilchardus in a highly diverse ecosystem: the Mediterranean Sea , 2011 .

[43]  H. Weimerskirch,et al.  Projected poleward shift of king penguins' (Aptenodytes patagonicus) foraging range at the Crozet Islands, southern Indian Ocean , 2012, Proceedings of the Royal Society B: Biological Sciences.

[44]  Bruce L. Webber,et al.  Here be dragons: a tool for quantifying novelty due to covariate range and correlation change when projecting species distribution models , 2014 .

[45]  M. Austin Species distribution models and ecological theory: A critical assessment and some possible new approaches , 2007 .

[46]  W. Thuiller,et al.  Predicting species distribution: offering more than simple habitat models. , 2005, Ecology letters.

[47]  S. Dobrowski,et al.  Spatial regression methods capture prediction uncertainty in species distribution model projections through time , 2013 .

[48]  Jane Elith,et al.  What do we gain from simplicity versus complexity in species distribution models , 2014 .

[49]  P. Monestiez,et al.  Predicting top predator habitats in the Southwest Indian Ocean , 2014 .

[50]  Sudipto Banerjee,et al.  On Geodetic Distance Computations in Spatial Modeling , 2005, Biometrics.

[51]  M. G. Pittau,et al.  A weakly informative default prior distribution for logistic and other regression models , 2008, 0901.4011.

[52]  E. Davis,et al.  Ecological niche models of mammalian glacial refugia show consistent bias , 2014 .

[53]  M. Giannoulaki,et al.  Habitat suitability modelling for sardine juveniles (Sardina pilchardus) in the Mediterranean Sea , 2011 .

[54]  Damaris Zurell,et al.  Collinearity: a review of methods to deal with it and a simulation study evaluating their performance , 2013 .

[55]  Hasok Chang Is Water H2O , 2012 .

[56]  R. Phillips,et al.  Biologging, Remotely-Sensed Oceanography and the Continuous Plankton Recorder Reveal the Environmental Determinants of a Seabird Wintering Hotspot , 2012, PloS one.

[57]  Peter L. Boveng,et al.  On Extrapolating Past the Range of Observed Data When Making Statistical Predictions in Ecology , 2015, PloS one.

[58]  Mark New,et al.  Ensemble forecasting of species distributions. , 2007, Trends in ecology & evolution.

[59]  A. Ellison,et al.  Should species distribution models account for spatial autocorrelation? A test of model projections across eight millennia of climate change , 2013 .

[60]  Constantin Koutsikopoulos,et al.  The effect of coastal topography on the spatial structure of anchovy and sardine , 2006 .