Abstract A common procedure for evaluating hierarchical cluster techniques is to compare the input data, in terms of for example a matrix of similarities or dissimilarities, with the output hierarchy expressed in matrix form. If an ordinary product-moment correlation is used for this comparison, the technique is known as that of cophenetic correlations, frequently used by numerical taxonomists. A high correlation between the input similarities and the output dendrogram has been regarded as a criterion of a successful classification. This paper contains a Monte Carlo study of the characteristics of the cophenetic correlation and a related measure of agreement which have been both interpreted in terms of generalized variance for some different hierarchical cluster algorithms. The generalized variance criterion chosen for this study is Wilk's lambda, whose sampling distribution under the null hypothesis of identical group centroids is used in this context to define the degree of separation between clusters. Thus, a probabilistic approach is introduced into the evaluation procedure. With the above definition of presence of clusters, use of the cophenetic correlation and related measures of agreement as criteria of goodness-of-fit is shown to be quite misleading in most cases. This is due to their large variability for low separation of clusters.
[1]
H. P. Friedman,et al.
On Some Invariant Criteria for Grouping Data
,
1967
.
[2]
J. Kruskal.
Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis
,
1964
.
[3]
J. Farris.
On the Cophenetic Correlation Coefficient
,
1969
.
[4]
Michael R. Anderberg,et al.
Cluster Analysis for Applications
,
1973
.
[5]
R. Sokal,et al.
THE COMPARISON OF DENDROGRAMS BY OBJECTIVE METHODS
,
1962
.
[6]
Richard O. Duda,et al.
Pattern classification and scene analysis
,
1974,
A Wiley-Interscience publication.
[7]
J. Hartigan.
REPRESENTATION OF SIMILARITY MATRICES BY TREES
,
1967
.
[8]
F. Marriott.
Practical problems in a method of cluster analysis.
,
1971,
Biometrics.
[9]
J. Carmichael,et al.
FINDING NATURAL CLUSTERS
,
1968
.
[10]
R. Sokal,et al.
THE INTELLIGENT IGNORAMUS, AN EXPERIMENT IN NUMERICAL TAXONOMY *
,
1970
.
[11]
Martin Schatzoff,et al.
Exact distributions of Wilks's likelihood ratio criterion
,
1966
.
[12]
Walter D. Fisher.
On Grouping for Maximum Homogeneity
,
1958
.
[13]
S. S. Wilks.
CERTAIN GENERALIZATIONS IN THE ANALYSIS OF VARIANCE
,
1932
.