Model‐based assessment of ecological community classifications

Aim A ‘good’ classification should provide information about the composition and abundance of the species within communities, if it serves as an informative surrogate for biodiversity. A natural way to formalize this is with a predictive model, where group membership (clusters) is the predictor, and multivariate species data (site by species matrix) is the response. In this study, we aimed to develop a predictive model-based framework for evaluating the predictive performance of alternative classifications of vegetation communities, and apply it to make objective and automated decisions about classification structure. Methods We used GLMs fit to multivariate species data to predict occurrence of individual species with site groupings. We used AIC to estimate predictive performance of alternative models to: (1) identify optimal partitioning of sites among multiple competing flexible-β clustering solutions; (2) identify species that contribute most to compositional differences between clusters (i.e. characteristic species); and (3) automatically merge clusters to maximize expected predictive performance using an iterative pruning approach. Using field data from southeastern Australia, and simulated data, we demonstrate our approach for common ecological data types (presence/absence, counts, cover–abundance scores, percentage cover). We supply all code and data required for these analyses. Results AIC was a useful metric for assessing competing classification solutions. Our method produced outputs that were simple to interpret and required few subjective choices to be made by the user, while performing similarly to the popular OptimClass assessment methodology. Characteristic species defined by predictive performance were consistent between data types, and had good general agreement with existing methods for defining characteristic species. Using model performance to iteratively refine clustering produced classifications with better than expected predictive performance compared to the dendrogram hierarchy, although the flexible-β hierarchy did a reasonable job of improving predictive performance. Conclusions Appropriately specified models are a natural way to maximize the predictive performance of a classification and its associated diagnostics. We show that a model-based assessment provides a clear decision framework based on data type, offering an objective pathway to make classification assessment decisions, as well as evaluate methodological choice and performance.

[1]  Yi Wang,et al.  mvabund– an R package for model‐based analysis of multivariate abundance data , 2012 .

[2]  Zoltán Botta-Dukát,et al.  A comparative framework for broad‐scale plot‐based vegetation classification , 2015 .

[3]  Ken Aho,et al.  Using geometric and non-geometric internal evaluators to compare eight vegetation classification methods , 2008 .

[4]  Milan Chytrý,et al.  Modified TWINSPAN classification in which the hierarchy respects cluster heterogeneity , 2009 .

[5]  M. Stone An Asymptotic Equivalence of Choice of Model by Cross‐Validation and Akaike's Criterion , 1977 .

[6]  D. Warton,et al.  Distance‐based multivariate analyses confound location and dispersion effects , 2012 .

[7]  R. O’Hara,et al.  Do not log‐transform count data , 2010 .

[8]  J. Podani Braun-Blanquet's legacy and data analysis in vegetation science , 2006 .

[9]  Ian Oliver,et al.  Semi‐automated assignment of vegetation survey plots within an a priori classification of vegetation types , 2013 .

[10]  Paul A Murtaugh,et al.  In defense of P values. , 2014, Ecology.

[11]  D. Roberts Vegetation classification by two new iterative reallocation optimization algorithms , 2015, Plant Ecology.

[12]  R. Grieken,et al.  Hierarchical cluster analysis with stopping rules built on Akaike's information criterion for aerosol particle classification based on electron probe X-ray microanalysis , 1994 .

[13]  David R. Anderson,et al.  Multimodel Inference , 2004 .

[14]  Andreas Lindén,et al.  Using the negative binomial distribution to model overdispersion in ecological count data. , 2011, Ecology.

[15]  Milan Chytrý,et al.  Determination of diagnostic species with statistical fidelity measures , 2002 .

[16]  Xavier Font,et al.  The management of vegetation classifications with fuzzy clustering , 2010 .

[17]  M. E. D. Poore,et al.  The Use of Phytosociological Methods in Ecological Investigations: I. The Braun-Blanquet System , 1955 .

[18]  P. Legendre,et al.  SPECIES ASSEMBLAGES AND INDICATOR SPECIES:THE NEED FOR A FLEXIBLE ASYMMETRICAL APPROACH , 1997 .

[19]  K. R. Clarke,et al.  Testing of null hypotheses in exploratory community analyses: similarity profiles and biota-environment linkage , 2008 .

[20]  Francis K C Hui,et al.  The arcsine is asinine: the analysis of proportions in ecology. , 2011, Ecology.

[21]  Helen M. Regan,et al.  A TAXONOMY AND TREATMENT OF UNCERTAINTY FOR ECOLOGY AND CONSERVATION BIOLOGY , 2002 .

[22]  Zoltán Botta-Dukát,et al.  OptimClass: Using species‐to‐cluster fidelity to determine the optimal partition in classification of ecological communities , 2010 .

[23]  B. Efron The Estimation of Prediction Error , 2004 .

[24]  Milan Chytrý,et al.  Supervised classification of plant communities with artificial neural networks , 2005 .

[25]  Zoltán Botta-Dukát,et al.  Semi‐supervised classification of vegetation: preserving the good old units and searching for new ones , 2014 .

[26]  János Podani,et al.  Assessing the relative importance of methodological decisions in classifications of vegetation data , 2015 .

[27]  Ladislav Mucina,et al.  Twenty years of numerical syntaxonomy , 1989, Vegetatio.

[28]  Ladislav Mucina,et al.  Classification of vegetation: past, present and future , 1997 .

[29]  Andrew McDougall,et al.  The role of human disturbance in island biogeography of arthropods and plants: an information theoretic approach , 2015 .