Evaluating classification strategies

. This contribution is a comment on the paper by Belbin & McDonald (1993) on the comparison of three classification strategies for use in ecology. There are two problems in evaluating clustering methods: does the sample adequately reflect the population structure, and what is the nature of the clusters sought. First, one has to decide on the number of clusters to be obtained. Possibly the best approach of all is the Bayesian coding theory for inductive inference. This may depend on the objectives of the clustering, which can be manifold. Phytosociologists do not agree on the nature of the clusters they seek, and are reticent in providing a formal definition of their clusters. As a method for identifying gradients Correspondence Analysis has had some success, so that a classification method largely based on it, notably TWINSPAN, may better reflect what phytosociologists are intuitively seeking than alternative variance minimisation methods. Additionally, TWINSPAN incorporates the characterisation through indicator species. Maybe we are more interested in these differentiating species than in the existence of clusters per se.

[1]  Enrico Feoli,et al.  Application of Probabilistic Methods in the Analysis of Phytosociological Data , 1991 .

[2]  Andrew K. C. Wong,et al.  PFS Clustering Method , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Helge Bruelheide,et al.  Arranging phytosociological tables by species‐relevé groups , 1994 .

[4]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .

[5]  C. S. Wallace,et al.  Estimation and Inference by Compact Coding , 1987 .

[6]  J. Hartigan,et al.  The runt test for multimodality , 1992 .

[7]  Anil K. Jain,et al.  Validity studies in clustering methodologies , 1979, Pattern Recognit..

[8]  M. B. Dale,et al.  Knowing When to Stop: Cluster Concept — Concept Cluster , 1991 .

[9]  M. Fligner,et al.  Multistage Ranking Models , 1988 .

[10]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[11]  P. C. Young,et al.  Probabilistic tests and stopping rules associated with hierarchical classification techniques , 1979 .

[12]  R. Tibshirani,et al.  Adaptive Principal Surfaces , 1994 .

[13]  Geoffrey J. McLachlan,et al.  The mixture method of clustering applied to three-way data , 1985 .

[14]  G. W. Milligan,et al.  The Effect of Cluster Size, Dimensionality, and the Number of Clusters on Recovery of True Cluster Structure , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  N. E. Day Estimating the components of a mixture of normal distributions , 1969 .

[16]  Anil K. Jain,et al.  Clustering Methodologies in Exploratory Data Analysis , 1980, Adv. Comput..

[17]  J. Hartigan,et al.  The Dip Test of Unimodality , 1985 .

[18]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[19]  Lloyd D. Fisher,et al.  Approximate confidence intervals for the number of clusters , 1989 .

[20]  Lee Belbin,et al.  Comparing three classification strategies for use in ecology , 1993 .

[21]  Ryszard S. Michalski,et al.  Pattern Recognition as Rule-Guided Inductive Inference , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Mb Dale,et al.  Inosculate Analysis of Vegetation Data , 1973 .