Species distribution modeling: a statistical review with focus in spatio-temporal issues

The use of complex statistical models has recently increased substantially in the context of species distribution behavior. This complexity has made the inferential and predictive processes challenging to perform. The Bayesian approach has become a good option to deal with these models due to the ease with which prior information can be incorporated along with the fact that it provides a more realistic and accurate estimation of uncertainty. In this paper, we first review the sources of information and different approaches (frequentist and Bayesian) to model the distribution of a species. We also discuss the Integrated Nested Laplace approximation as a tool with which to obtain marginal posterior distributions of the parameters involved in these models. We finally discuss some important statistical issues that arise when researchers use species data: the presence of a temporal effect (presenting different spatial and spatio-temporal structures), preferential sampling, spatial misalignment, non-stationarity, imperfect detection, and the excess of zeros.

[1]  D. Conesa,et al.  Spatial and climatic factors associated with the geographical distribution of citrus black spot disease in South Africa. A Bayesian latent Gaussian model approach , 2018, European Journal of Plant Pathology.

[2]  Svetlozar T. Rachev,et al.  Bayesian methods in finance , 2008 .

[3]  Finn Lindgren,et al.  Bayesian Spatial Modelling with R-INLA , 2015 .

[4]  R. Meentemeyer,et al.  Invasive species distribution modeling (iSDM): Are absence data and dispersal constraints needed to predict actual distributions? , 2009 .

[5]  G. Carpenter,et al.  DOMAIN: a flexible modelling procedure for mapping potential distributions of plants and animals , 1993, Biodiversity & Conservation.

[6]  Marie-Josée Fortin,et al.  Exploring spatial non-stationarity of fisheries survey data using geographically weighted regression (GWR): an example from the Northwest Atlantic , 2010 .

[7]  Kai Zhu,et al.  More than the sum of the parts: forest climate response from joint species distribution models. , 2014, Ecological applications : a publication of the Ecological Society of America.

[8]  L. L. Cam,et al.  Maximum likelihood : an introduction , 1990 .

[9]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[10]  J. C. Zadoks,et al.  A geostatistical analysis of the spatio - temporal development of downy mildew epidemics in cabbage , 1994 .

[11]  R. Dettmers,et al.  Influence of point count length and repeated visits on habitat model performance , 1999 .

[12]  David Conesa,et al.  Identifying the best fishing-suitable areas under the new European discard ban , 2016 .

[13]  K. Yau,et al.  Zero‐Inflated Negative Binomial Mixed Regression Modeling of Over‐Dispersed Count Data with Extra Zeros , 2003 .

[14]  G. Heuvelink,et al.  Spatio-temporal prediction of daily temperatures using time-series of MODIS LST images , 2013, Theoretical and Applied Climatology.

[15]  Antonio López-Quílez,et al.  Spatio-Temporal model structures with shared components for semi-continuous species distribution modelling , 2017 .

[16]  Jennifer A. Miller,et al.  Mapping Species Distributions: Spatial Inference and Prediction , 2010 .

[17]  Nick Golding,et al.  Fast and flexible Bayesian species distribution modelling using Gaussian processes , 2016 .

[18]  L. Held,et al.  Bayesian analysis of measurement error models using integrated nested Laplace approximations , 2015 .

[19]  Pravin K. Trivedi,et al.  Regression Analysis of Count Data , 1998 .

[20]  M. McCarthy Bayesian Methods for Ecology: Frontmatter , 2007 .

[21]  James S. Clark,et al.  Hierarchical Modelling for the Environmental Sciences: Statistical Methods and Applications , 2006 .

[22]  M. Hooten,et al.  Dynamic spatio-temporal models for spatial data , 2017 .

[23]  Bani K. Mallick,et al.  Bayesian Analysis of Gene Expression Data , 2009 .

[24]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[25]  J. Bedia,et al.  A framework for species distribution modelling with improved pseudo-absence generation , 2015 .

[26]  J. Andrew Royle,et al.  ESTIMATING ABUNDANCE FROM REPEATED PRESENCE–ABSENCE DATA OR POINT COUNTS , 2003 .

[27]  H. Rue,et al.  An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach , 2011 .

[28]  Håvard Rue,et al.  Direct fitting of dynamic models using integrated nested Laplace approximations - INLA , 2012, Comput. Stat. Data Anal..

[29]  M. Bartlett Properties of Sufficiency and Statistical Tests , 1992 .

[30]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[31]  M. Hulme,et al.  A high-resolution data set of surface climate over global land areas , 2002 .

[32]  Jennifer A. Hoeting,et al.  An Improved Model for Spatially Correlated Binary Responses , 2000 .

[33]  D. Conesa,et al.  Estimation and prediction of the spatial occurrence of fish species using Bayesian latent Gaussian models , 2013, Stochastic Environmental Research and Risk Assessment.

[34]  Virgilio Gómez-Rubio,et al.  Markov chain Monte Carlo with the Integrated Nested Laplace Approximation , 2017, Stat. Comput..

[35]  J. Nichols,et al.  A DOUBLE-OBSERVER APPROACH FOR ESTIMATING DETECTION PROBABILITY AND ABUNDANCE FROM POINT COUNTS , 2000 .

[36]  D. Hall Measurement Error in Nonlinear Models: A Modern Perspective , 2008 .

[37]  Mark W. Schwartz,et al.  How fast and far might tree species migrate in the eastern United States due to climate change , 2004 .

[38]  Michela Cameletti,et al.  Comparing spatio‐temporal models for particulate matter in Piemonte , 2011 .

[39]  A. Gelfand,et al.  Explaining Species Distribution Patterns through Hierarchical Modeling , 2006 .

[40]  F. Lindgren,et al.  Spatial models with explanatory variables in the dependence structure , 2014 .

[41]  T. Kneib,et al.  BayesX: Analyzing Bayesian Structural Additive Regression Models , 2005 .

[42]  Stanislav Anatolyev,et al.  AN ALTERNATIVE TO MAXIMUM LIKELIHOOD BASED ON SPACINGS , 2005, Econometric Theory.

[43]  S. Ferrari,et al.  Beta Regression for Modelling Rates and Proportions , 2004 .

[44]  Camille Parmesan,et al.  Ecological and methodological drivers of species ' distribution and phenology responses to climate change , 2015 .

[45]  T. Hastie,et al.  Using multivariate adaptive regression splines to predict the distributions of New Zealand ’ s freshwater diadromous fish , 2005 .

[46]  S. Lek,et al.  Applications of artificial neural networks for patterning and predicting aquatic insect species richness in running waters , 2003 .

[47]  M. Cameletti,et al.  Spatial and Spatio-temporal Bayesian Models with R - INLA , 2015 .

[48]  Alexander Shapiro,et al.  On the asymptotics of constrained local M-estimators , 2000 .

[49]  J. Andrew Royle N‐Mixture Models for Estimating Population Size from Spatially Replicated Counts , 2004, Biometrics.

[50]  Gianluca Baio,et al.  Spatial and spatio-temporal models with R-INLA. , 2013, Spatial and spatio-temporal epidemiology.

[51]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[52]  R. Wayne,et al.  Re-defining historical geographic range in species with sparse records: Implications for the Mexican wolf reintroduction program , 2016 .

[53]  Ross K. Meentemeyer,et al.  Epidemiological modeling of invasion in heterogeneous landscapes: spread of sudden oak death in California (1990–2030) , 2011 .

[54]  Duncan Lee,et al.  CARBayes: An R Package for Bayesian Spatial Modeling with Conditional Autoregressive Priors , 2013 .

[55]  Daniel Simpson,et al.  Accounting for physical barriers in species distribution modeling with non-stationary spatial random effects , 2016 .

[56]  R. Swihart,et al.  Absent or undetected? Effects of non-detection of species occurrence on wildlife-habitat models , 2004 .

[57]  S. Fotheringham,et al.  Geographically Weighted Regression , 1998 .

[58]  George Casella,et al.  A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data , 2008, 0808.2902.

[59]  T. Bailey Spatial Analysis: A Guide for Ecologists , 2006 .

[60]  L. Hansen Large Sample Properties of Generalized Method of Moments Estimators , 1982 .

[61]  M. Hooten,et al.  A general science-based framework for dynamical spatio-temporal models , 2010 .

[62]  F. Jiguet,et al.  Selecting pseudo‐absences for species distribution models: how, where and how many? , 2012 .

[63]  D. Klein,et al.  Using GIS to predict potential wildlife habitat: A case study of muskoxen in northern Alaska , 2002 .

[64]  Alessio Pollice,et al.  Discussing the “big n problem” , 2013, Stat. Methods Appl..

[65]  WenJun Zhang,et al.  Supervised neural network recognition of habitat zones of rice invertebrates , 2007 .

[66]  J. Mateu,et al.  Hierarchical spatial modeling of the presence of Chagas disease insect vectors in Argentina. A comparative approach , 2017, Stochastic Environmental Research and Risk Assessment.

[67]  Jakub Stoklosa,et al.  A climate of uncertainty: accounting for error in climate variables for species distribution models , 2015 .

[68]  Drew W. Purves,et al.  Fine‐scale environmental variation in species distribution modelling: regression dilution, latent variables and neighbourly advice , 2011 .

[69]  WenJun Zhang,et al.  Recognizing spatial distribution patterns of grassland insects: neural network approaches , 2008 .

[70]  Noel A Cressie,et al.  Statistics for Spatio-Temporal Data , 2011 .

[71]  Pulak Ghosh,et al.  A spatial Poisson hurdle model for exploring geographic variation in emergency department visits , 2013, Journal of the Royal Statistical Society. Series A,.

[72]  M. Pennino,et al.  Modeling sensitive parrotfish (Labridae: Scarini) habitats along the Brazilian coast. , 2015, Marine environmental research.

[73]  Mevin B Hooten,et al.  An integrated data model to estimate spatiotemporal occupancy, abundance, and colonization dynamics. , 2017, Ecology.

[74]  Andrew Thomas,et al.  The BUGS project: Evolution, critique and future directions , 2009, Statistics in medicine.

[75]  Jörg Müller,et al.  Spatial smoothing techniques for the assessment of habitat suitability , 2008, Environmental and Ecological Statistics.

[76]  Mevin B. Hooten,et al.  A hierarchical Bayesian non-linear spatio-temporal model for the spread of invasive species with application to the Eurasian Collared-Dove , 2008, Environmental and Ecological Statistics.

[77]  Scott D. Foster,et al.  Uncertainty in spatially predicted covariates: is it ignorable? , 2012 .

[78]  Walter Jetz,et al.  Patterns and causes of species richness: a general simulation model for macroecology. , 2009, Ecology letters.

[79]  P. Kinas,et al.  Bayesian spatial predictive models for data-poor fisheries , 2017 .

[80]  Scott E. Nielsen,et al.  Can models of presence‐absence be used to scale abundance? Two case studies considering extremes in life history , 2005 .

[81]  J. L. Parra,et al.  Very high resolution interpolated climate surfaces for global land areas , 2005 .

[82]  Judith Wolf,et al.  Bayesian joint models with INLA exploring marine mobile predator–prey and competitor species habitat overlap , 2017, Ecology and evolution.

[83]  J. Illian,et al.  Accounting for preferential sampling in species distribution models , 2018, Ecology and evolution.

[84]  James T. Thorson,et al.  Faster estimation of Bayesian models in ecology using Hamiltonian Monte Carlo , 2017 .

[85]  C. Graham,et al.  Integrating GIS-based environmental data into evolutionary biology. , 2008, Trends in ecology & evolution.

[86]  L Gosoniu,et al.  Bayesian modelling of geostatistical malaria risk data. , 2006, Geospatial health.

[87]  S. Goetz,et al.  The relative importance of climate and vegetation properties on patterns of North American breeding bird species richness , 2014 .

[88]  Finn Lindgren,et al.  Bayesian computing with INLA: New features , 2012, Comput. Stat. Data Anal..

[89]  Leonhard Held,et al.  Using integrated nested Laplace approximations for the evaluation of veterinary surveillance data from Switzerland: a case‐study , 2011 .

[90]  Alan E. Gelfand,et al.  Joint Species Distribution Modeling: Dimension Reduction Using Dirichlet Processes , 2017 .

[91]  Laura J. Pollock,et al.  Understanding co‐occurrence by modelling species simultaneously with a Joint Species Distribution Model (JSDM) , 2014 .

[92]  Raymond J. Carroll,et al.  Measurement error in nonlinear models: a modern perspective , 2006 .

[93]  J. Aitchison On the Distribution of a Positive Random Variable Having a Discrete Probability Mass at the Origin , 1955 .

[94]  Pravin K. Trivedi,et al.  Regression Analysis of Count Data: Measurement Errors , 1998 .

[95]  Mevin B. Hooten,et al.  Spatial occupancy models for large data sets , 2013 .

[96]  Haavard Rue,et al.  A toolbox for fitting complex spatial point process models using integrated nested Laplace approximation (INLA) , 2012, 1301.1817.

[97]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[98]  D. Cox,et al.  A note on pseudolikelihood constructed from marginal densities , 2004 .

[100]  H. Rue,et al.  Spatio-temporal modeling of particulate matter concentration through the SPDE approach , 2012, AStA Advances in Statistical Analysis.

[101]  Antonio López-Quílez,et al.  Modelling the presence of disease under spatial misalignment using Bayesian latent Gaussian models. , 2016, Geospatial health.

[102]  M. McCarthy Bayesian Methods for Ecology: CASE STUDIES , 2007 .

[103]  Jennifer A. Miller,et al.  Exploring Spatial Scale, Autocorrelation and Nonstationarity of Bird Species Richness Patterns , 2015, ISPRS Int. J. Geo Inf..

[104]  Olatz Aizpurua,et al.  Optimising long-term monitoring projects for species distribution modelling: how atlas data may help , 2015 .

[105]  W. Thuiller,et al.  Predicting species distribution: offering more than simple habitat models. , 2005, Ecology letters.

[106]  Sharon X. Lee,et al.  EMMIXuskew: An R Package for Fitting Mixtures of Multivariate Skew t Distributions via the EM Algorithm , 2012, 1211.5290.

[107]  Haavard Rue,et al.  Estimating animal abundance with N-mixture models using the R-INLA package for R , 2017 .

[108]  Duccio Rocchini,et al.  Will remote sensing shape the next generation of species distribution models? , 2015 .

[109]  Ángel M. Fernández,et al.  Bayesian spatio-temporal approach to identifying fish nurseries by validating persistence areas , 2015 .

[110]  Alan E. Gelfand,et al.  Zero-inflated models with application to spatial count data , 2002, Environmental and Ecological Statistics.

[111]  Brian P. Weaver,et al.  Bayesian Methods for the Physical Sciences: Learning from Examples in Astronomy and Physics , 2015 .

[112]  H. Rue,et al.  On the Second‐Order Random Walk Model for Irregular Locations , 2008 .

[113]  B. Reich,et al.  A spatial–temporal double-hurdle model for extremely over-dispersed avian count data , 2016 .

[114]  Colin M Beale,et al.  Regression analysis of spatial data. , 2010, Ecology letters.

[115]  Simon Hallstan Species distribution models , 2011 .

[116]  Syeda Hira Fatima,et al.  Species Distribution Modelling of Aedes aegypti in two dengue‐endemic regions of Pakistan , 2016, Tropical medicine & international health : TM & IH.

[117]  A. Gelfand,et al.  Handbook of spatial statistics , 2010 .

[118]  Andrew W. Roddam,et al.  Measurement Error in Nonlinear Models: a Modern Perspective , 2008 .

[119]  Simon Jackman,et al.  Bayesian Analysis for the Social Sciences , 2009 .

[120]  Virgilio Gómez-Rubio,et al.  Spatial Models Using Laplace Approximation Methods , 2019, Handbook of Regional Science.

[121]  Antonio López-Quílez,et al.  Development and Comparison of Species Distribution Models for Forest Inventories , 2017, ISPRS Int. J. Geo Inf..

[122]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[123]  M. Hooten,et al.  Hierarchical Species Distribution Models , 2016 .

[124]  Gavin J. Gibson,et al.  Bayesian Analysis for Inference of an Emerging Epidemic: Citrus Canker in Urban Landscapes , 2014, PLoS Comput. Biol..

[125]  Gianluca Baio,et al.  Bayesian Methods in Health Economics , 2012 .

[126]  Penelope Vounatsou,et al.  Bayesian Geostatistical Modeling of Leishmaniasis Incidence in Brazil , 2013, PLoS neglected tropical diseases.

[127]  Patrick Brown,et al.  Model-Based Geostatistics the Easy Way , 2015 .

[128]  M. Araújo,et al.  Validation of species–climate impact models under climate change , 2005 .

[129]  Mark D. Risser,et al.  Review: Nonstationary Spatial Modeling, with Emphasis on Process Convolution and Covariate-Driven Approaches , 2016, 1610.02447.

[130]  Shanshan Wu,et al.  Building statistical models to analyze species distributions. , 2006, Ecological applications : a publication of the Ecological Society of America.

[131]  A. Townsend Peterson,et al.  Ecological niche modelling and prioritizing areas for species reintroductions , 2006, Oryx.

[132]  Hugh P Possingham,et al.  Presence–Absence versus Abundance Data for Monitoring Threatened Species , 2006, Conservation biology : the journal of the Society for Conservation Biology.

[133]  Facundo Muñoz,et al.  Modeling sensitive elasmobranch habitats , 2013 .

[134]  Peter J. Diggle,et al.  Statistical Analysis of Spatial and Spatio-Temporal Point Patterns , 2013 .

[135]  L. R. Shenton,et al.  Estimation, Method of Moments , 2006 .

[136]  P. Diggle,et al.  Geostatistical inference under preferential sampling , 2010 .

[137]  Antonio López-Quílez,et al.  Bayesian spatio-temporal discard model in a demersal trawl fishery , 2014 .

[138]  B. Mérigot,et al.  Habitat modeling for cetacean management: Spatial distribution in the southern Pelagos Sanctuary (Mediterranean Sea) , 2017 .

[139]  Marie A. Gaudard,et al.  Bayesian spatial prediction , 1999, Environmental and Ecological Statistics.

[140]  J. Andrew Royle,et al.  ESTIMATING SITE OCCUPANCY RATES WHEN DETECTION PROBABILITIES ARE LESS THAN ONE , 2002, Ecology.

[141]  Jay M. Ver Hoef,et al.  Space—time zero‐inflated count models of Harbor seals , 2007 .

[142]  James M. Bullock,et al.  Modelling the spread and control of Xylella fastidiosa in the early stages of invasion in Apulia, Italy , 2017, Biological Invasions.

[143]  Trevor Hastie,et al.  Generalized linear and generalized additive models in studies of species distributions: setting the scene , 2002 .

[144]  Andrew O. Finley,et al.  Comparing spatially‐varying coefficients models for analysis of ecological data with non‐stationary and anisotropic residual dependence , 2011 .

[145]  Sw. Banerjee,et al.  Hierarchical Modeling and Analysis for Spatial Data , 2003 .

[146]  Andrew Thomas,et al.  WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility , 2000, Stat. Comput..

[147]  C. F. Sirmans,et al.  Spatial Modeling With Spatially Varying Coefficient Processes , 2003 .

[148]  Miska Luoto,et al.  Modelling the occurrence of threatened plant species in taiga landscapes: methodological and ecological perspectives , 2008 .

[149]  P. Barber,et al.  MARSPEC: ocean climate layers for marine spatial ecology , 2013 .

[150]  Antonio López-Quílez,et al.  Spatiotemporal modeling of relative risk of dengue disease in Colombia , 2018, Stochastic Environmental Research and Risk Assessment.

[151]  J. Andrew Royle,et al.  Hierarchical Spatiotemporal Matrix Models for Characterizing Invasions , 2007, Biometrics.

[152]  James J. Opaluch,et al.  Analyze the risks of biological invasion , 2011 .

[153]  Robert A. Gitzen,et al.  Design and Analysis of Long-term Ecological Monitoring Studies: Index , 2012 .

[154]  J. Elith,et al.  Species Distribution Models: Ecological Explanation and Prediction Across Space and Time , 2009 .

[155]  Jenný Brynjarsdóttir,et al.  Analysis of cod catch data from Icelandic groundfish surveys using generalized linear models , 2004 .

[156]  Gunnar Stefánsson,et al.  Analysis of groundfish survey abundance data: combining the GLM and delta approaches , 1996 .

[157]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[158]  Haavard Rue,et al.  Bayesian Computing with INLA: A Review , 2016, 1604.00860.

[159]  Boris Worm,et al.  Applying Bayesian spatiotemporal models to fisheries bycatch in the Canadian Arctic , 2015 .

[160]  Mevin B Hooten,et al.  The basis function approach for modeling autocorrelation in ecological data. , 2016, Ecology.

[161]  Jennifer A. Miller Species distribution models , 2012 .

[162]  Michael L. Stein,et al.  Interpolation of spatial data , 1999 .

[163]  Christopher K. Wikle,et al.  Hierarchical Bayesian Models for Predicting The Spread of Ecological Processes , 2003 .

[164]  Kevin B. Reid,et al.  Exploring non-stationary and scale-dependent relationships between walleye (Sander vitreus) distribution and habitat variables in Lake Erie , 2017 .

[165]  J. Mullahy Specification and testing of some modified count data models , 1986 .

[166]  P. Diggle,et al.  Model‐based geostatistics , 2007 .

[167]  J. Busby BIOCLIM - a bioclimate analysis and prediction system , 1991 .

[168]  Jake F. Weltzin,et al.  The biogeography of prediction error: why does the introduced range of the fire ant over-predict its native range? , 2006 .

[169]  Daniel A. Griffith,et al.  Spatial-Filtering-Based Contributions to a Critique of Geographically Weighted Regression (GWR) , 2008 .

[170]  H. Rue,et al.  Approximate Bayesian Inference for Survival Models , 2010 .

[171]  David R. Anderson,et al.  Statistical inference from capture data on closed animal populations , 1980 .

[172]  Francis K. C. Hui,et al.  Model-based simultaneous clustering and ordination of multivariate abundance data in ecology , 2017, Comput. Stat. Data Anal..

[173]  H. Rue,et al.  Point process models for spatio-temporal distance sampling data , 2016, 1604.06013.

[174]  A. Peterson,et al.  Ecologic Niche Modeling and Potential Reservoirs for Chagas Disease, Mexico. , 2002, Emerging infectious diseases.

[175]  M. Stein,et al.  A Bayesian analysis of kriging , 1993 .

[176]  Håvard Rue,et al.  A Bayesian Approach to estimate the biomass of anchovies in the coast of Perú , 2014 .

[177]  J. Andrew Royle,et al.  Estimating species richness and accumulation by modeling species occurrence and detectability. , 2006, Ecology.

[178]  Facundo Muñoz,et al.  Fishery-dependent and -independent data lead to consistent estimations of essential habitats , 2016 .

[179]  Christopher Daly,et al.  Guidelines for assessing the suitability of spatial climate data sets , 2006 .

[180]  Antonio López-Quílez,et al.  Bovine paramphistomosis in Galicia (Spain): prevalence, intensity, aetiology and geospatial distribution of the infection. , 2013, Veterinary parasitology.