Model-based thinking for community ecology

In this paper, a case is made for the use of model-based approaches for the analysis of community data. This involves the direct specification of a statistical model for the observed multivariate data. Recent advances in statistical modelling mean that it is now possible to build models that are appropriate for the data which address key ecological questions in a statistically coherent manner. Key advantages of this approach include interpretability, flexibility, and efficiency, which we explain in detail and illustrate by example. The steps in a model-based approach to analysis are outlined, with an emphasis on key features arising in a multivariate context. A key distinction in the model-based approach is the emphasis on diagnostic checking to ensure that the model provides reasonable agreement with the observed data. Two examples are presented that illustrate how the model-based approach can provide insights into ecological problems not previously available. In the first example, we test for a treatment effect in a study where different sites had different sampling intensities, which was handled by adding an offset term to the model. In the second example, we incorporate trait information into a model for ordinal response in order to identify the main reasons why species differ in their environmental response.

[1]  P. McCullagh Regression Models for Ordinal Data , 1980 .

[2]  R. Christensen Regression Models for Ordinal Data Introducing R-package ordinal , 2011 .

[3]  J. Elith,et al.  Species Distribution Models: Ecological Explanation and Prediction Across Space and Time , 2009 .

[4]  Wim A. Ozinga,et al.  Selecting traits that explain species–environment relationships: a generalized linear mixed model approach , 2013 .

[5]  David I. Warton,et al.  The fourth‐corner solution – using predictive models to understand how species traits interact with the environment , 2014 .

[6]  Geof H. Givens,et al.  Modelling biological regions from multi‐species and environmental data , 2013 .

[7]  Cajo J. F. ter Braak,et al.  Bayesian model-based cluster analysis for predicting macrofaunal communities , 2003 .

[8]  C. W. Dilke Statistics of the Defence Expenditure of the Chief Military and Naval Powers , 1891 .

[9]  Peter K. Dunn,et al.  Randomized Quantile Residuals , 1996 .

[10]  Francis K C Hui,et al.  To mix or not to mix: comparing the predictive performance of mixture models vs. separate species distribution models. , 2013, Ecology.

[11]  M. Hill,et al.  Data analysis in community and landscape ecology , 1987 .

[12]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[13]  Jane Elith,et al.  Predicting species distributions from museum and herbarium records using multiresponse models fitted with multivariate adaptive regression splines , 2007 .

[14]  Scott D. Foster,et al.  Model based grouping of species across environmental gradients , 2011 .

[15]  A. Gelfand,et al.  Modelling species diversity through species level hierarchical modelling , 2005 .

[16]  Catherine A Calder,et al.  Accounting for uncertainty in ecological analysis: the strengths and limitations of hierarchical statistical modeling. , 2009, Ecological applications : a publication of the Ecological Society of America.

[17]  Lene Theil Skovgaard,et al.  Applied regression analysis. 3rd edn. N. R. Draper and H. Smith, Wiley, New York, 1998. No. of pages: xvii+706. Price: £45. ISBN 0‐471‐17082‐8 , 2000 .

[18]  M. Azevedo,et al.  Species-richness patterns in space, depth, and time (1989-1999) of the Portuguese fauna sampled by bottom trawl , 2006 .

[19]  J. Shuster Diagnostics for assumptions in moderate to large simple clinical trials: do they really help? , 2005, Statistics in medicine.

[20]  Francis K C Hui,et al.  The arcsine is asinine: the analysis of proportions in ecology. , 2011, Ecology.

[21]  V. Barnett,et al.  Applied Linear Statistical Models , 1975 .

[22]  D. Warton,et al.  Distance‐based multivariate analyses confound location and dispersion effects , 2012 .

[23]  Julian D. Olden,et al.  Assessing transferability of ecological models: an underappreciated aspect of statistical validation , 2012 .

[24]  Rampal S Etienne,et al.  A neutral sampling formula for multiple samples and an 'exact' test of neutrality. , 2007, Ecology letters.

[25]  Otso Ovaskainen,et al.  Making more out of sparse data: hierarchical modeling of species communities. , 2011, Ecology.

[26]  Robert B. O'Hara,et al.  Do not log‐transform count data , 2010 .

[27]  Alain F. Zuur,et al.  A protocol for data exploration to avoid common statistical problems , 2010 .

[28]  Mollie E. Brooks,et al.  Generalized linear mixed models: a practical guide for ecology and evolution. , 2009, Trends in ecology & evolution.

[29]  William K. Morris,et al.  The role of functional traits in species distributions revealed through a hierarchical model , 2012 .

[30]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[31]  Maureen C. Kennedy,et al.  Applied statistics in ecology: common pitfalls and simple solutions , 2013 .

[32]  Yi Wang,et al.  mvabund– an R package for model‐based analysis of multivariate abundance data , 2012 .

[33]  Richard Arnold,et al.  Multivariate methods using mixtures: Correspondence analysis, scaling and pattern-detection , 2014, Comput. Stat. Data Anal..

[34]  S. Lahiri Resampling Methods for Dependent Data , 2003 .

[35]  David I Warton,et al.  Regularized Sandwich Estimators for Analysis of High‐Dimensional Data Using Generalized Estimating Equations , 2011, Biometrics.

[36]  Anthony R. Ives,et al.  Generalized linear mixed models for phylogenetic analyses of community structure , 2011 .

[37]  Marti J. Anderson,et al.  A new method for non-parametric multivariate analysis of variance in ecology , 2001 .

[38]  C. Braak Canonical Correspondence Analysis: A New Eigenvector Technique for Multivariate Direct Gradient Analysis , 1986 .

[39]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[40]  David I. Warton,et al.  Finite Mixture of Regression Modeling for High-Dimensional Count and Biomass Data in Ecology , 2013 .

[41]  Michael H. Kutner Applied Linear Statistical Models , 1974 .

[42]  M. Bravington,et al.  Graphical Diagnostics for Markov Models for Categorical Data , 2011 .

[43]  D. W. Goodall,et al.  Non-linear ordination in several dimensions , 1982, Vegetatio.

[44]  James S. Clark,et al.  Models for Ecological Data: An Introduction , 2007 .

[45]  H. Hudson,et al.  A MANOVA STATISTIC IS JUST AS POWERFUL AS DISTANCE-BASED STATISTICS, FOR MULTIVARIATE ABUNDANCES , 2004 .

[46]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[47]  D. Warton Raw data graphing: an informative but under‐utilized tool for the analysis of multivariate abundances , 2008 .

[48]  J. Schmee An Introduction to Multivariate Statistical Analysis , 1986 .

[49]  Hugh G. Gauch,et al.  Ordination of Vegetation Samples by Gaussian Species Distributions , 1974 .

[50]  T. Yee The VGAM Package for Categorical Data Analysis , 2010 .

[51]  Antoine Guisan,et al.  Spatial modelling of biodiversity at the community level , 2006 .

[52]  N. Draper,et al.  Applied Regression Analysis: Draper/Applied Regression Analysis , 1998 .