Multivariate exploratory analysis of ordinal data in ecology: Pitfalls, problems and solutions

Abstract Questions: Are ordinal data appropriately treated by multivariate methods in numerical ecology? If not, what are the most common mistakes? Which dissimilarity coefficients, ordination and classification methods are best suited to ordinal data? Should we worry about such problems at all? Methods: A new classification model family, OrdClAn (Ordinal Cluster Analysis), is suggested for hierarchical and non-hierarchical classifications from ordinal ecological data, e.g. the abundance/dominance scores that are commonly recorded in relevés. During the clustering process, the objects are grouped so as to minimize a measure calculated from the ranks of within-cluster and between-cluster distances or dissimilarities. Results and Conclusions: Evaluation of the various steps of exploratory data analysis of ordinal ecological data shows that consistency of methodology throughout the study is of primary importance. In an optimal situation, each methodological step is order invariant. This property ensures that the results are independent of changes not affecting ordinal relationships, and guarantees that no illusory precision is introduced into the analysis. However, the multivariate procedures that are most commonly applied in numerical ecology do not satisfy these requirements and are therefore not recommended. For example, it is inappropriate to analyse Braun-Blanquet abudance/dominance data by methods assuming that Euclidean distance is meaningful. The solution of all problems is that the dissimilarity coefficient should be compatible with ordinal variables and the subsequent ordination or clustering method should consider only the rank order of dissimilarities. A range of artificial data sets exemplifying different subtypes of ordinal variables, e.g. indicator values or species scores from relevés, illustrate the advocated approach. Detailed analyses of an actual phytosociological data set demonstrate the classification by OrdClAn of relevés and species and the subsequent tabular rearrangement, in a numerical study remaining within the ordinal domain from the first step to the last. Abbreviations: AD = Abundance/Dominance; CL = Complete Link; DC = Coefficient of Discordance; ED = Euclidean distance; O = Ordinal; M = Metric; NMDS = Non-metric Multidimensional Scaling; OC = Ordinal Clustering; SL = Single Link; UPGMA = Unweighted Pair Group Method or Group Average Clustering.

[1]  P. Digby,et al.  Multivariate Analysis of Ecological Communities , 1987, Population and Community Biology.

[2]  M. B. Dale,et al.  Dissimilarity for partially ranked data and its application to cover-abundance data , 1989, Vegetatio.

[3]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[4]  David Pager On the Efficiency of Algorithms , 1970, JACM.

[5]  John C. Tipper,et al.  The Adequacy of Non-Metric Data in Geology: Tests Using a Divisive-Omnithetic Clustering Technique , 1978, The Journal of Geology.

[6]  Rahul Shah,et al.  On the Complexity of Ordinal Clustering , 2006, J. Classif..

[7]  János Podani,et al.  Explanatory Variables in Classifications and the Detection of the Optimum Number of Clusters , 1998 .

[8]  R. Sibson Order Invariant Methods for Data Analysis , 1972 .

[9]  A Gordon,et al.  Classification, 2nd Edition , 1999 .

[10]  Tapio Salakoski,et al.  General formulation and evaluation of agglomerative clustering methods with metric and non-metric distances , 1993, Pattern Recognit..

[11]  Antoine Guisan,et al.  Ordinal response regression models in ecology. , 2000 .

[12]  Toshiji Kawagoe,et al.  Voice matters in a dictator game , 2008 .

[13]  R. Matthews,et al.  Mathematical Analysis of Temporal and Spatial Trends in the Benthic Macroinvertebrate Communities of a Small Stream , 1991 .

[14]  L. Hubert Monotone invariant clustering procedures , 1973 .

[15]  László Orlóci,et al.  Applying Metric and Nonmetric Multidimensional Scaling to Ecological Studies: Some New Results , 1986 .

[16]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[17]  Hans-Hermann Bock,et al.  Data Science, Classification and Related Methods , 1998 .

[18]  E. Taylor,et al.  Population subdivision in westslope cutthroat trout (Oncorhynchus clarki lewisi) at the northern periphery of its range: evolutionary inferences and conservation implications , 2003, Molecular ecology.

[19]  I. C. Prentice,et al.  NON-METRIC ORDINATION METHODS IN ECOLOGY , 1977 .

[20]  D. H. Knight,et al.  Aims and Methods of Vegetation Ecology , 1974 .

[21]  L. A. Goodman,et al.  Measures of association for cross classifications , 1979 .

[22]  J. F. Marcotorchino,et al.  Optimisation en analyse ordinale des données , 1979 .

[23]  B. Everitt,et al.  Cluster Analysis (2nd ed). , 1982 .

[24]  János Podani,et al.  Introduction to the exploration of multivariate biological data , 2000 .

[25]  J. William Ahwood,et al.  CLASSIFICATION , 1931, Foundations of Familiar Language.

[26]  János Podani,et al.  SYN-TAX IV. Computer Programs for Data Analysis in Ecology and Systematics , 1991 .

[27]  A. MacDougall,et al.  ARE INVASIVE SPECIES THE DRIVERS OR PASSENGERS OF CHANGE IN DEGRADED ECOSYSTEMS , 2005 .

[28]  A. Agresti Modelling ordered categorical data: recent advances and future challenges. , 1999, Statistics in medicine.

[29]  K. R. Clarke,et al.  Non‐parametric multivariate analyses of changes in community structure , 1993 .

[30]  János Podani Simulation of Random Dendrograms and Comparison Tests: Some Comments , 2000, J. Classif..

[31]  Jan Lepš,et al.  How reliable are our vegetation analyses , 1992 .

[32]  Robin A. Matthews,et al.  DESIGN AND ANALYSIS OF MULTISPECIES TOXICITY TESTS FOR PESTICIDE REGISTRATION , 1997 .

[33]  Jörg Ewald,et al.  A critique for phytosociology , 2003 .

[34]  James W. Hearne,et al.  Clustering Without a Metric , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences. , 1957 .

[36]  J. Podani Extending Gower's general coefficient of similarity to ordinal characters , 1999 .

[37]  J. W. Hutchinson,et al.  Pairwise partitioning: A nonmetric algorithm for identifying feature-based similarity structures , 1997 .

[38]  P. Legendre,et al.  The generation of random ultrametric matrices representing dendrograms , 1991 .

[39]  E. R. Peay Nonmetric grouping: Clusters and cliques , 1975 .

[40]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[41]  D. Critchlow Metric Methods for Analyzing Partially Ranked Data , 1986 .

[42]  R. Whittaker,et al.  A comparative study of nonmetric ordinations. , 1981 .

[43]  R. Matthews,et al.  Classification and ordination of limnological data: a comparison of analytical tools , 1991 .

[44]  K. Clarke Nonmetric multivariate analysis in community‐level ecotoxicology , 1999 .