New approaches to modelling fish―habitat relationships

Ecologists often develop models that describe the relationship between faunal communities and their habitat. Coral reef fishes have been the focus of numerous such studies, which have used a wide range of statistical tools to answer an equally wide range of questions. Here, we apply a series of both conventional statistical techniques (linear and generalized additive regression models) and novel machine-learning techniques (the support vector machine and three ensemble techniques used with regression trees) to predict fish species richness, biomass, and diversity from a range of habitat variables. We compare the techniques in terms of their predictive performance, and we compare a subset of the models in terms of the influence each habitat variable has for the predictions. Prediction errors are estimated by cross-validation, and variable importance is assessed using permutations of individual variable values. For predictions of species richness and diversity the tree-based models generally and the random forest model specifically are superior (produce the lowest errors). These model types are all able to model both nonlinear and interaction effects. The linear model, unable to model either effect type, performs the worst (produces the highest errors). For predictions of biomass, the generalized additive model is superior, and the support vector machine performs the worst. Depth range, the difference between maximum and minimum water depth at a given site, is identified as the most important variable in the majority of models predicting the three fish community variables. However, variable importance is highly dependent upon model type, which leads to questions regarding the interpretation of variable importance and its proper use as an indicator of causality. The representation of ecological relationships by tree-based ensemble learners will improve predictive performance, and provide a new avenue for exploring ecological relationships, both statistical and causal.

[1]  Javier M. Moguerza,et al.  Support Vector Machines with Applications , 2006, math/0612817.

[2]  A. Manica,et al.  Evidence for a depth refuge effect in artisanal coral reef fisheries , 2009 .

[3]  Aimee Elizabeth Taylor Statistical enhancement of support vector machines , 2009 .

[4]  Craig Syms,et al.  DISTURBANCE, HABITAT STRUCTURE, AND THE DYNAMICS OF A CORAL-REEF FISH COMMUNITY , 2000 .

[5]  Niklaus E. Zimmermann,et al.  Predicting tree species presence and basal area in Utah: A comparison of stochastic gradient boosting, generalized additive models, and tree-based methods , 2006 .

[6]  Simon J. Pittman,et al.  Using Lidar Bathymetry and Boosted Regression Trees to Predict the Diversity and Abundance of Fish and Corals , 2009 .

[7]  G. Jones,et al.  Coral decline threatens fish biodiversity in marine reserves. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[8]  M. Conner,et al.  Methods to quantify variable importance: implications for the analysis of noisy ecological data. , 2009, Ecology.

[9]  T. Hastie,et al.  Variation in demersal fish species richness in the oceans surrounding New Zealand: an analysis using boosted regression trees , 2006 .

[10]  M. Graham CONFRONTING MULTICOLLINEARITY IN ECOLOGICAL MULTIPLE REGRESSION , 2003 .

[11]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[12]  I. Nagelkerken,et al.  Importance of shallow-water biotopes of a Caribbean bay for juvenile coral reef fishes: patterns in biotope association, community structure and spatial distribution , 2000 .

[13]  James Parrish,et al.  Habitat characteristics affecting fish assemblages on a Hawaiian coral reef , 1998 .

[14]  G. De’ath,et al.  CLASSIFICATION AND REGRESSION TREES: A POWERFUL YET SIMPLE TECHNIQUE FOR ECOLOGICAL DATA ANALYSIS , 2000 .

[15]  G. van der Velde,et al.  Indo-Pacific seagrass beds and mangroves contribute to fish density and diversity on adjacent coral reefs , 2005 .

[16]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[17]  Gretchen G. Moisen,et al.  Comparing five modelling techniques for predicting forest characteristics , 2002 .

[18]  G. Pierce,et al.  Modelling of essential fish habitat based on remote sensing, spatial analysis and GIS , 2008, Hydrobiologia.

[19]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[20]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[21]  J. Bohnsack,et al.  A stationary visual census technique for quantitatively assessing community structure of coral reef fishes , 1986 .

[22]  R. G. Davies,et al.  Methods to account for spatial autocorrelation in the analysis of species distributional data : a review , 2007 .

[23]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[24]  C. McAlpine,et al.  Linking fish and prawns to their environment: a hierarchical landscape approach , 2004 .

[25]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[26]  C. Menza,et al.  Predictive mapping of fish species richness across shallow-water seascapes in the Caribbean , 2007 .

[27]  T. Hastie,et al.  Comparative performance of generalized additive models and multivariate adaptive regression splines for statistical modelling of species distributions , 2006 .

[28]  P. Mumby,et al.  Mangroves enhance the biomass of coral reef fish communities in the Caribbean , 2004, Nature.

[29]  S. Purkis,et al.  Predictability of reef fish diversity and abundance using remote sensing data in Diego Garcia (Chagos Archipelago) , 2008, Coral Reefs.

[30]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[31]  Trevor Hastie,et al.  Generalized linear and generalized additive models in studies of species distributions: setting the scene , 2002 .

[32]  M. McCormick,et al.  Comparison of field methods for measuring surface topography and their associations with a tropical reef fish assemblage , 1994 .

[33]  B. Luckhurst,et al.  Analysis of the influence of substrate variables on coral reef fish communities , 1978 .

[34]  A. Prasad,et al.  Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction , 2006, Ecosystems.

[35]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[36]  Anders Knudby,et al.  Progress in the use of remote sensing for coral reef biodiversity studies , 2007 .

[37]  Brian Gratwicke,et al.  The relationship between fish species richness, abundance and habitat complexity in a range of shallow tropical marine habitats , 2005 .

[38]  N. Polunin,et al.  Appraisal of visual assessments of habitat complexity and benthic composition on coral reefs , 2007 .

[39]  C. Mellin,et al.  Spatial predictability of juvenile fish species richness and abundance in a coral reef environment , 2007, Coral Reefs.

[40]  Steven J. Phillips,et al.  WHAT MATTERS FOR PREDICTING THE OCCURRENCES OF TREES: TECHNIQUES, DATA, OR SPECIES' CHARACTERISTICS? , 2007 .

[41]  J. Roberts,et al.  Behavioural differences in microhabitat use by damselfishes (Pomacentridae): implications for reef fish biodiveristy , 1996 .

[42]  S. Hile,et al.  Using seascape types to explain the spatial patterns of fish in the mangroves of SW Puerto Rico , 2007 .

[43]  S. Jennings,et al.  Dynamic fragility of oceanic coral reef ecosystems. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[44]  G. De’ath Boosted trees for ecological modeling and prediction. , 2007, Ecology.

[45]  R. Galzin,et al.  Relationships between coral reef substrata and fish , 1997, Coral Reefs.

[46]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[47]  H. Possingham,et al.  Environmental factors that influence the distribution of coral reef fishes: modeling occurrence data for broad-scale conservation and management , 2008 .

[48]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[49]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[50]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[51]  G. Jones,et al.  Disturbance, habitat structure and the ecology of fishes on coral reefs , 1998 .

[52]  Thomas K. Frazer,et al.  Influence of landscape structure on reef fish assemblages , 2007, Landscape Ecology.

[53]  J. Connell Diversity in tropical rain forests and coral reefs. , 1978, Science.

[54]  A. Palialexis,et al.  A GIS environmental modelling approach to essential fish habitat designation , 2004 .

[55]  Richard S. Appeldoorn,et al.  Cross-Shelf Habitat Utilization Patterns of Reef Fishes in Southwestern Puerto Rico , 2003 .

[56]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[57]  S. Manel,et al.  Alternative methods for predicting species distribution: an illustration with Himalayan river birds , 1999 .

[58]  E. Lara,et al.  The relationship between reef fish community structure and environmental variables in the southern Mexican Caribbean , 1998 .