Using geometric and non-geometric internal evaluators to compare eight vegetation classification methods

Abstract Questions: How similar are solutions of eight commonly used vegetation classification methods? Which classification methods are most effective according to classification validity evaluators? How do evaluators with different optimality criteria differ in their assessments of classification efficacy? In particular, do evaluators which use geometric criteria (e.g. cluster compactness) and non-geometric evaluators (which rely on diagnostic species) offer similar classification evaluations? Methods: We analysed classifications of two vegetation datasets produced by eight classification methods. Classification solutions were assessed with five geometric and four non-geometric internal evaluators. We formally introduce three new evaluators: PARTANA, an intuitive variation on evaluators which use the ratio of within/between cluster dissimilarity as the optimality criterion, an adaptation of Morisita's index of niche overlap, and ISAMIC, an algorithm which measures the degree to which species are either always present or always absent within clusters. Results and Conclusions: 1. With the exception of single linkage hierarchical clustering, classifications resulting from the eight methods were often similar. 2. Although evaluators varied in their assessment of best overall classification method, they generally favored three hierarchical agglomerative clustering strategies: flexible beta (β = − 0.25), average linkage, and Ward's linkage. 3. Among introduced evaluators PARTANA appears to be an effective geometric strategy which provides assessments similar to C-index and Gamma evaluators. Non-geometric evaluators ISAMIC and Morisita's index demonstrate a strong bias for single linkage solutions. 4. Because non-geometric criteria are of interest to phytosociologists there is a strong need for their continued development for use with vegetation classifications.

[1]  M. Cappellini,et al.  CO-ORDINATING EDITOR , 2008 .

[2]  L. Belbin,et al.  Evaluation of statistical models used for predicting plant species distributions: Role of artificial data and theory , 2006 .

[3]  T. Weaver,et al.  IDENTIFICATION OF COMMUNITY TYPES OF SE MONTANA AS TARGETS FOR MINE RECLAMATION 1 , 2006 .

[4]  Ken A. Aho Alpine and cliff ecosystems in the North-Central Rocky Mountains , 2006 .

[5]  Nadia Bolshakova,et al.  Estimating the Number of Clusters in DNA Microarray Data , 2006, Methods of Information in Medicine.

[6]  Milan Chytrý,et al.  Diversity of hay meadows in the Czech Republic: major types and environmental gradients , 2004 .

[7]  P. Pyšek,et al.  Weed vegetation of arable land in Central Europe: Gradients of diversity and species composition , 2004 .

[8]  Shusaku Tsumoto,et al.  Comparison of clustering methods for clinical databases , 2004, Inf. Sci..

[9]  Daniel P. Faith,et al.  Compositional dissimilarity as a robust measure of ecological distance , 1987, Vegetatio.

[10]  J. Jacobs,et al.  Quantitative measurement of food selection , 1974, Oecologia.

[11]  W. P. Williams,et al.  A comparison of clustering methods for river benthic community analysis , 2004, Hydrobiologia.

[12]  E. Taylor,et al.  Population subdivision in westslope cutthroat trout (Oncorhynchus clarki lewisi) at the northern periphery of its range: evolutionary inferences and conservation implications , 2003, Molecular ecology.

[13]  B. McCune,et al.  Analysis of Ecological Communities , 2002 .

[14]  D. Penn The scent of genetic compatibility: Sexual selection and the major histocompatibility complex , 2002 .

[15]  D. Alard,et al.  Diversity patterns in grasslands along a landscape gradient in northwestern France , 2000 .

[16]  R. Wein,et al.  Factors determining the centrifugal organization of remnant Festuca grassland communities in Alberta , 2000 .

[17]  L. Wuest,et al.  Plant communities of New Brunswick in relation to environmental variation , 1999 .

[18]  Ladislav Mucina,et al.  Classification of vegetation: past, present and future , 1997 .

[19]  P. Legendre,et al.  SPECIES ASSEMBLAGES AND INDICATOR SPECIES:THE NEED FOR A FLEXIBLE ASYMMETRICAL APPROACH , 1997 .

[20]  M. Peinado,et al.  Phytosociological, bioclimatic and biogeographical classification of woody climax communities of western North America , 1997 .

[21]  H. Shugart,et al.  Functional classifications of coastal barrier island vegetation , 1996 .

[22]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[23]  Robert M. Zink,et al.  Bird species diversity , 1996, Nature.

[24]  E. Maarel,et al.  Plant communities in kettle‐holes in central Poland: chance occurrence of species? , 1995 .

[25]  M. B. Dale,et al.  Evaluating classification strategies , 1995 .

[26]  F. D. Pineda,et al.  Influence of landscape complexity and land management on woody plant diversity in northern Spain , 1994 .

[27]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[28]  Lee Belbin,et al.  Comparing three classification strategies for use in ecology , 1993 .

[29]  K. R. Clarke,et al.  Non‐parametric multivariate analyses of changes in community structure , 1993 .

[30]  S. Scheiner,et al.  Diversity patterns of wet meadows along geochemical gradients in central Spain , 1993 .

[31]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[32]  M. B. Dale,et al.  Knowing When to Stop: Cluster Concept — Concept Cluster , 1991 .

[33]  J. Wilson Community Structure in the Flora of Islands in Lake Manapouri, New Zealand , 1988 .

[34]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[35]  E. W. Beals,et al.  Bray-curtis ordination: an effective strategy for analysis of multivariate ecological data , 1984 .

[36]  E. Smith,et al.  Bias in Estimating Niche Overlap , 1982 .

[37]  R. Whittaker,et al.  Hierarchical Classification of Community Data , 1981 .

[38]  G. W. Milligan,et al.  A monte carlo study of thirty internal criterion measures for cluster analysis , 1981 .

[39]  J. Connell Diversity and the coevolution of competitors, or the ghost of competition past , 1980 .

[40]  G. W. Milligan,et al.  An examination of the effect of six types of error perturbation on fifteen clustering algorithms , 1980 .

[41]  G. W. Milligan,et al.  The validation of four ultrametric clustering algorithms , 1980, Pattern Recognit..

[42]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[43]  M. O. Hill,et al.  TWINSPAN: a FORTRAN program of arranging multivariate data in an ordered two way table by classification of individual and attributes , 1979 .

[44]  L. A. Goodman,et al.  Measures of association for cross classifications , 1979 .

[45]  J. Tukey,et al.  Variations of Box Plots , 1978 .

[46]  S. Hurlbert The Measurement of Niche Overlap and Some Relatives , 1978 .

[47]  L. Hubert,et al.  A general statistical framework for assessing categorical clustering in free recall. , 1976 .

[48]  Anil K. Jain,et al.  Clustering techniques: The user's dilemma , 1976, Pattern Recognit..

[49]  L. Fisher,et al.  391: A Monte Carlo Comparison of Six Clustering Procedures , 1975 .

[50]  L. Hubert,et al.  Measuring the Power of Hierarchical Cluster Analysis , 1975 .

[51]  J. Hartigan Clustering Algorithms , 1975 .

[52]  Peter H. A. Sneath,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification , 1973 .

[53]  John C. Ogilvie,et al.  Evaluation of hierarchical grouping techniques; a preliminary study , 1972, Comput. J..

[54]  Mike P. Austin,et al.  Principal Component Ordination and Simulated Vegetational Data , 1970 .

[55]  P. Greig-Smith,et al.  Plant Communities: Textbook of Plant Synecology. , 1971 .

[56]  László Orlóci,et al.  An Agglomerative Method for Classification of Plant Communities , 1967 .

[57]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[58]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[59]  H. S. Horn,et al.  Measurement of "Overlap" in Comparative Ecological Studies , 1966, The American Naturalist.

[60]  H. Prauser,et al.  Robert R. Sokal und Peter H. A. Sneath, Principles of Numerical Taxonomy 1. Aufl. XVI, 359 S., 38 Abb., 21 Tab. San Francisco and London 1963: W. H. Freeman and Company 60 s , 1966 .

[61]  O. J. Dunn Multiple Comparisons Using Rank Sums , 1964 .

[62]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[63]  R. Sokal,et al.  THE COMPARISON OF DENDROGRAMS BY OBJECTIVE METHODS , 1962 .

[64]  R. Macarthur,et al.  On Bird Species Diversity , 1961 .

[65]  Louis L. McQuitty,et al.  Hierarchical Linkage Analysis for the Isolation of Types , 1960 .

[66]  R. Whittaker Vegetation of the Siskiyou Mountains, Oregon and California , 1960 .

[67]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[68]  P. Sneath The application of computers to taxonomy. , 1957, Journal of general microbiology.

[69]  J. T. Curtis,et al.  An Ordination of the Upland Forest Communities of Southern Wisconsin , 1957 .

[70]  William Kruskal,et al.  A Nonparametric test for the Several Sample Problem , 1952 .

[71]  W. Kruskal,et al.  Use of Ranks in One-Criterion Variance Analysis , 1952 .

[72]  H. E. Brogden A new coefficient: Application to biserial correlation and to estimation of selective efficiency , 1949, Psychometrika.

[73]  E. H. Simpson Measurement of Diversity , 1949, Nature.