Predicting fish species richness in estuaries: Which modelling technique to use?

Four different modelling techniques were compared and evaluated: generalized linear models (GLM), generalized additive models (GAM), classification and regression trees (CART) and boosted regression trees (BRT). Each method was used to model fish species richness variation throughout several Portuguese estuarine systems. Model comparisons were based on goodness-of-fit and predictive performance via cross-validation. The relative influence of the most important predictors according to each of the four models was also examined. Fitted BRT, CART, GAM and GLM models accounted for 70.6%, 57.0%, 34.6% and 23.7% of total model deviance, respectively. No single variable was consistently responsible for the larger amount of percentage of relative deviance explained by the models, but several variables were selected by the four models. Nevertheless, their relative importance was highly variable, according to each modelling technique. The tree-based models (CART and BRT) presented lower prediction errors after cross-validation. The limitations and usefulness of each technique are discussed. Four different modelling techniques were compared and evaluated: GLM, GAM, CART, BRT.Each method was used to model fish species richness in estuaries.Model comparisons were based on goodness-of-fit and predictive performance.Tree-based models outperformed GLM and GAM, presenting lower prediction errors.

[1]  H. Cabral,et al.  Predicting estuarine use patterns of juvenile fish with Generalized Linear Models , 2013 .

[2]  T. Hastie,et al.  Variation in demersal fish species richness in the oceans surrounding New Zealand: an analysis using boosted regression trees , 2006 .

[3]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[4]  C. Meynard,et al.  The effect of a gradual response to the environment on species distribution modeling performance , 2012 .

[5]  B. Muys,et al.  Comparison and ranking of different modelling techniques for prediction of site index in Mediterranean mountain forests , 2010 .

[6]  B. Sautour,et al.  Fish under influence: A macroecological analysis of relations between fish species richness and environmental gradients among European tidal estuaries , 2010 .

[7]  Jerome H Friedman,et al.  Multiple additive regression trees with application in epidemiology , 2003, Statistics in medicine.

[8]  W. Thuiller,et al.  Predicting species distribution: offering more than simple habitat models. , 2005, Ecology letters.

[9]  Jane Elith,et al.  Comparing species abundance models , 2006 .

[10]  M. Jeuffroy,et al.  Comparison of different models predicting the date of beginning of flowering in pea (Pisum sativum L.) , 1999 .

[11]  Stuart I. Rogers,et al.  Estimating limits to the spatial extent and suitability of sole (Solea solea) nursery grounds in the Dover Strait , 2003 .

[12]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[13]  M. Araújo,et al.  An evaluation of methods for modelling species distributions , 2004 .

[14]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[15]  Gretchen G. Moisen,et al.  Comparing five modelling techniques for predicting forest characteristics , 2002 .

[16]  J. Olden A Species‐Specific Approach to Modeling Biological Communities and Its Potential for Conservation , 2003 .

[17]  John T. Froeschke,et al.  Spatio-temporal predictive model based on environmental factors for juvenile spotted seatrout in Texas estuaries using boosted regression trees , 2011 .

[18]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[19]  C. Chatfield Model uncertainty, data mining and statistical inference , 1995 .

[20]  M. Conner,et al.  Methods to quantify variable importance: implications for the analysis of noisy ecological data. , 2009, Ecology.

[21]  A. Fielding,et al.  Testing the Generality of Bird‐Habitat Models , 1995 .

[22]  Anders Knudby,et al.  New approaches to modelling fish―habitat relationships , 2010 .

[23]  J. Leclere,et al.  A comparison of modeling techniques to predict juvenile 0+ fish species occurrences in a large river system , 2011, Ecol. Informatics.

[24]  Joseph H. A. Guillaume,et al.  Characterising performance of environmental models , 2013, Environ. Model. Softw..

[25]  S. França,et al.  Inter- and intra-estuarine fish assemblage variability patterns along the Portuguese coast , 2011 .

[26]  F. Loc'h,et al.  Habitat suitability for juvenile common sole (Solea solea, L.) in the Bay of Biscay (France): A quantitative description using indicators based on epibenthic fauna , 2007 .

[27]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[28]  A. Cattrijsse,et al.  Habitat use by fishes in estuaries and other brackish areas , 2007 .

[29]  S. França,et al.  Assessing habitat specific fish assemblages in estuaries along the Portuguese coast , 2009 .

[30]  M. Austin Species distribution models and ecological theory: A critical assessment and some possible new approaches , 2007 .

[31]  W. L. Chadderton,et al.  Dispersal, disturbance and the contrasting biogeographies of New Zealand’s diadromous and non‐diadromous fish species , 2008 .

[32]  B. Ripley,et al.  Recursive Partitioning and Regression Trees , 2015 .

[33]  C. Meynard,et al.  Predicting species distributions: a critical comparison of the most common statistical models using artificial species , 2007 .

[34]  Gretchen G. Moisen,et al.  A comparison of the performance of threshold criteria for binary classification in terms of predicted prevalence and Kappa , 2008 .

[35]  A. Lehmann,et al.  Improving generalized regression analysis for the spatial prediction of forest communities , 2006 .

[36]  P. Reis-Santos,et al.  Predicting fish community properties within estuaries: Influence of habitat type and other environmental features , 2012 .

[37]  Jennifer A. Miller,et al.  Modeling the distribution of four vegetation alliances using generalized linear models and classification trees with spatial dependence , 2002 .

[38]  Julian D. Olden,et al.  A comparison of statistical approaches for modelling fish species distributions , 2002 .

[39]  M. Austin A silent clash of paradigms : some inconsistencies in community ecology , 1999 .

[40]  Glenn De'ath,et al.  Classification and regression trees: a powerful yet simple technique for the analysis of complex ecological data , 2000 .

[41]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[42]  S. Wood Modelling and smoothing parameter estimation with multiple quadratic penalties , 2000 .

[43]  G. De’ath,et al.  CLASSIFICATION AND REGRESSION TREES: A POWERFUL YET SIMPLE TECHNIQUE FOR ECOLOGICAL DATA ANALYSIS , 2000 .

[44]  Daniel W. McKenney,et al.  Spatial models of site index based on climate and soil properties for two boreal tree species in Ontario, Canada , 2003 .

[45]  S. Wood,et al.  GAMs with integrated model selection using penalized regression splines and applications to environmental modelling , 2002 .

[46]  G. De’ath Boosted trees for ecological modeling and prediction. , 2007, Ecology.

[47]  Mark New,et al.  Ensemble forecasting of species distributions. , 2007, Trends in ecology & evolution.

[48]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[49]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[50]  B. McGill,et al.  Testing the predictive performance of distribution models , 2013 .

[51]  R. G. Davies,et al.  Methods to account for spatial autocorrelation in the analysis of species distributional data : a review , 2007 .

[52]  Bernard De Baets,et al.  Habitat prediction and knowledge extraction for spawning European grayling (Thymallus thymallus L.) using a broad range of species distribution models , 2013, Environ. Model. Softw..

[53]  M. Austin Spatial prediction of species distribution: an interface between ecological theory and statistical modelling , 2002 .

[54]  Simon J. Pittman,et al.  Using Lidar Bathymetry and Boosted Regression Trees to Predict the Diversity and Abundance of Fish and Corals , 2009 .

[55]  J. Paruelo,et al.  How to evaluate models : Observed vs. predicted or predicted vs. observed? , 2008 .

[56]  Niklaus E. Zimmermann,et al.  Predicting tree species presence and basal area in Utah: A comparison of stochastic gradient boosting, generalized additive models, and tree-based methods , 2006 .

[57]  M. Araújo,et al.  Five (or so) challenges for species distribution modelling , 2006 .

[58]  Walter Jetz,et al.  Patterns and causes of species richness: a general simulation model for macroecology. , 2009, Ecology letters.

[59]  T. Hastie,et al.  Using multivariate adaptive regression splines to predict the distributions of New Zealand ’ s freshwater diadromous fish , 2005 .

[60]  Antoine Guisan,et al.  Predictive habitat distribution models in ecology , 2000 .

[61]  M. J. Costa,et al.  Assessing anthropogenic pressures on estuarine fish nurseries along the Portuguese coast: a multi-metric index and conceptual approach. , 2007, The Science of the total environment.

[62]  J. Leathwick,et al.  Predictive models of small fish presence and abundance in northern New Zealand harbours , 2005 .