论文信息 - Multivariate exploratory analysis of ordinal data in ecology: Pitfalls, problems and solutions

Multivariate exploratory analysis of ordinal data in ecology: Pitfalls, problems and solutions

Abstract Questions: Are ordinal data appropriately treated by multivariate methods in numerical ecology? If not, what are the most common mistakes? Which dissimilarity coefficients, ordination and classification methods are best suited to ordinal data? Should we worry about such problems at all? Methods: A new classification model family, OrdClAn (Ordinal Cluster Analysis), is suggested for hierarchical and non-hierarchical classifications from ordinal ecological data, e.g. the abundance/dominance scores that are commonly recorded in relevés. During the clustering process, the objects are grouped so as to minimize a measure calculated from the ranks of within-cluster and between-cluster distances or dissimilarities. Results and Conclusions: Evaluation of the various steps of exploratory data analysis of ordinal ecological data shows that consistency of methodology throughout the study is of primary importance. In an optimal situation, each methodological step is order invariant. This property ensures that the results are independent of changes not affecting ordinal relationships, and guarantees that no illusory precision is introduced into the analysis. However, the multivariate procedures that are most commonly applied in numerical ecology do not satisfy these requirements and are therefore not recommended. For example, it is inappropriate to analyse Braun-Blanquet abudance/dominance data by methods assuming that Euclidean distance is meaningful. The solution of all problems is that the dissimilarity coefficient should be compatible with ordinal variables and the subsequent ordination or clustering method should consider only the rank order of dissimilarities. A range of artificial data sets exemplifying different subtypes of ordinal variables, e.g. indicator values or species scores from relevés, illustrate the advocated approach. Detailed analyses of an actual phytosociological data set demonstrate the classification by OrdClAn of relevés and species and the subsequent tabular rearrangement, in a numerical study remaining within the ordinal domain from the first step to the last. Abbreviations: AD = Abundance/Dominance; CL = Complete Link; DC = Coefficient of Discordance; ED = Euclidean distance; O = Ordinal; M = Metric; NMDS = Non-metric Multidimensional Scaling; OC = Ordinal Clustering; SL = Single Link; UPGMA = Unweighted Pair Group Method or Group Average Clustering.

János Podani | J. Podani

[1] P. Digby,et al. Multivariate Analysis of Ecological Communities , 1987, Population and Community Biology.

[2] M. B. Dale,et al. Dissimilarity for partially ranked data and its application to cover-abundance data , 1989, Vegetatio.

[3] J. Kruskal. Nonmetric multidimensional scaling: A numerical method , 1964 .

[4] David Pager. On the Efficiency of Algorithms , 1970, JACM.

[5] John C. Tipper,et al. The Adequacy of Non-Metric Data in Geology: Tests Using a Divisive-Omnithetic Clustering Technique , 1978, The Journal of Geology.

[6] Rahul Shah,et al. On the Complexity of Ordinal Clustering , 2006, J. Classif..

[7] János Podani,et al. Explanatory Variables in Classifications and the Detection of the Optimum Number of Clusters , 1998 .

[8] R. Sibson. Order Invariant Methods for Data Analysis , 1972 .

[9] A Gordon,et al. Classification, 2nd Edition , 1999 .

[10] Tapio Salakoski,et al. General formulation and evaluation of agglomerative clustering methods with metric and non-metric distances , 1993, Pattern Recognit..

[11] Antoine Guisan,et al. Ordinal response regression models in ecology. , 2000 .