Comparison of distance-based and model-based ordinations.

Distance-based ordinations have played a critical role in community ecology for more than half a century, but are still under active development. These methods employ a matrix of pairwise distances or dissimilarities between sample units, and map sample units from the high-dimensional distance or dissimilarity space to a low dimensional representation for analysis. Distance- or dissimilarity-based methods employ continuum or gradient ecological theory and a variety of statistical models to achieve the mapping. Recently, ecologists have developed model-based ordinations based on latent vectors and individual species response models. These methods employ the individualistic perspective of Gleason as the ecological model, and Bayesian or maximum likelihood methods to estimate the parameters for the low dimensional space represented by the latent vectors. In this research I compared two distance-based methods (NMDS and t-SNE) with two model-based methods (BORAL and REO) on five data sets to determine which methods are superior for (1) extracting meaningful ecological drivers of variability in community composition; and (2) estimating sample unit locations in ordination space that maximize the goodness-of-fit of individual species response models to the estimated sample unit locations. Environmental variables and species were fitted to the ordinations by Generalized AdditiveModels (GAMs) with Gaussian, negative binomial, or Poisson distribution models as appropriate. Across the five data sets, 22 models of environmental variability and 449 models of species distributions were calculated for each of the ordination methods. To minimize the effects of stochasticity the entire analysis was replicated three times and results averaged across the replicates. Results were evaluated by deviance explained and AIC for environmental variables and species distributions, averaged by ordination method for each data set, and ranked from best to worst. For the four assessments distance-based methods ranked 1 and 2 in three cases, and 1 and 3 in one case, significantly out-performing the model-based methods. t-SNE was the top performing method, out-performing NMDS especially on the more complex data sets. In general the gradient-based theoretical basis and data sufficiency of distance-based methods allowed distance-based methods to outperform model-based methods in every assessment.

[1]  D. Roberts Distance, dissimilarity, and mean–variance ratios in ordination , 2017 .

[2]  S. Wood Generalized Additive Models: An Introduction with R, Second Edition , 2017 .

[3]  Anna Norberg,et al.  How to make more out of community data? A conceptual framework and its implementation as models and software. , 2017, Ecology letters.

[4]  P. Legendre,et al.  Ecologically meaningful transformations for ordination of species data , 2001, Oecologia.

[5]  Francis K. C. Hui,et al.  boral – Bayesian Ordination and Regression Analysis of Multivariate Abundance Data in r , 2016 .

[6]  Francis K. C. Hui,et al.  So Many Variables: Joint Modeling in Community Ecology. , 2015, Trends in ecology & evolution.

[7]  D. Roberts Vegetation classification by two new iterative reallocation optimization algorithms , 2015, Plant Ecology.

[8]  Sara Taskinen,et al.  Model‐based approaches to unconstrained ordination , 2015 .

[9]  Thomas W. Yee,et al.  Vector Generalized Linear and Additive Models , 2015 .

[10]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[11]  C. ter Braak,et al.  Generalized linear mixed models can detect unimodal species-environment relationships , 2013, PeerJ.

[12]  S. Wood mgcv:Mixed GAM Computation Vehicle with GCV/AIC/REML smoothness estimation , 2012 .

[13]  Laurens van der Maaten,et al.  Analyzing floristic inventories with multiple maps , 2012, Ecol. Informatics.

[14]  Donald A. Jackson,et al.  Random-effects ordination: describing and predicting multivariate correlations and co-occurrences , 2011 .

[15]  Otso Ovaskainen,et al.  Making more out of sparse data: hierarchical modeling of species communities. , 2011, Ecology.

[16]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[17]  Comparison of multidimensional fuzzy set ordination with CCA and DB-RDA. , 2009, Ecology.

[18]  A. Nobel,et al.  On the size and recovery of submatrices of ones in a random binary matrix , 2008 .

[19]  P. Legendre,et al.  Forward selection of explanatory variables. , 2008, Ecology.

[20]  David W Roberts,et al.  Statistical analysis of multidimensional fuzzy set ordinations. , 2008, Ecology.

[21]  P. Legendre,et al.  Beals smoothing revisited , 2008, Oecologia.

[22]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[23]  M. B. Dale,et al.  On objectives of methods of ordination , 1975, Vegetatio.

[24]  J. Podani Comparison of ordinations and classifications of vegetation data , 1989, Vegetatio.

[25]  M. Austin Performance of four ordination techniques assuming three different non-linear species response models , 1976, Vegetatio.

[26]  Sovan Lek,et al.  A comparison of self-organizing map algorithm and some conventional statistical methods for ecological community ordination , 2001 .

[27]  S. Wood Modelling and smoothing parameter estimation with multiple quadratic penalties , 2000 .

[28]  A. O. Nicholls,et al.  To fix or not to fix the species limits, that is the ecological question: Response to Jari Oksanen , 1997 .

[29]  Jari Oksanen,et al.  Why the beta-function cannot be used to estimate skewness of species responses , 1997 .

[30]  A. O. Nicholls,et al.  Determining species response functions to an environmental gradient by means of a β‐function , 1994 .

[31]  Jintun Zhang,et al.  A comparison of three methods of multivariate analysis of upland grasslands in North Wales , 1994 .

[32]  K. R. Clarke,et al.  A Comparison of some methods for analysing changes in benthic community structure , 1991, Journal of the Marine Biological Association of the United Kingdom.

[33]  C. Braak Canonical Correspondence Analysis: A New Eigenvector Technique for Multivariate Direct Gradient Analysis , 1986 .

[34]  M. Fasham,et al.  A Comparison of Nonmetric Multidimensional Scaling, Principal Components and Reciprocal Averaging for the Ordination of Simulated Coenoclines, and Coenoplanes , 1977 .

[35]  I. C. Prentice,et al.  NON-METRIC ORDINATION METHODS IN ECOLOGY , 1977 .

[36]  Nellie Smeenk-Enserink,et al.  Correlations Between Distributions of Hunting Spiders (Lycosidae, Ctenidae) and Environmental Characteristics in a Dune Area , 1974 .

[37]  A. J. B. Anderson,et al.  Ordination Methods in Ecology , 1971 .

[38]  J. Kruskal Monotone regression: Continuity and differentiability properties , 1971 .

[39]  J. M. A. Swan,et al.  An Examination of Some Ordination Problems By Use of Simulated Vegetational Data , 1970 .

[40]  R. Whittaker,et al.  GRADIENT ANALYSIS OF VEGETATION* , 1967, Biological reviews of the Cambridge Philosophical Society.

[41]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[42]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[43]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. II , 1962 .

[44]  J. T. Curtis,et al.  An Ordination of the Upland Forest Communities of Southern Wisconsin , 1957 .

[45]  Robert H. Whittaker,et al.  A Consideration of Climax Theory: The Climax as a Population and Pattern , 1953 .

[46]  J. T. Curtis,et al.  An Upland Forest Continuum in the Prairie‐Forest Border Region of Wisconsin , 1951 .

[47]  H. Gleason The individualistic concept of the plant association , 1926 .