A validation study of a variable weighting algorithm for cluster analysis

De Soete (1986, 1988) proposed a variable weighting procedure when Euclidean distance is used as the dissimilarity measure with an ultrametric hierarchical clustering method. The algorithm produces weighted distances which approximate ultrametric distances as closely as possible in a least squares sense. The present simulation study examined the effectiveness of the De Soete procedure for an applications problem for which it was not originally intended. That is, to determine whether or not the algorithm can be used to reduce the influence of variables which are irrelevant to the clustering present in the data. The simulation study examined the ability of the procedure to recover a variety of known underlying cluster structures. The results indicate that the algorithm is effective in identifying extraneous variables which do not contribute information about the true cluster structure. Weights near 0.0 were typically assigned to such extraneous variables. Furthermore, the variable weighting procedure was not adversely effected by the presence of other forms of error in the data. In general, it is recommended that the variable weighting procedure be used for applied analyses when Euclidean distance is employed with ultrametric hierarchical clustering methods.

[1]  G. N. Lance,et al.  Mixed-Data Classificatory Programs I - Agglomerative Systems , 1967, Aust. Comput. J..

[2]  J. Kruskal TOWARD A PRACTICAL METHOD WHICH HELPS UNCOVER THE STRUCTURE OF A SET OF MULTIVARIATE OBSERVATIONS BY FINDING THE LINEAR TRANSFORMATION WHICH OPTIMIZES A NEW “INDEX OF CONDENSATION” , 1969 .

[3]  F. Rohlf Adaptive Hierarchical Clustering Schemes , 1970 .

[4]  R. M. Cormack,et al.  A Review of Classification , 1971 .

[5]  Brian Everitt,et al.  Cluster analysis , 1974 .

[6]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[7]  G. Milligan Ultrametric hierarchical clustering algorithms , 1979 .

[8]  G. W. Milligan,et al.  An examination of the effect of six types of error perturbation on fifteen clustering algorithms , 1980 .

[9]  G. W. Milligan,et al.  The validation of four ultrametric clustering algorithms , 1980, Pattern Recognit..

[10]  Vladimir Batagelj,et al.  Note on ultrametric hierarchical clustering algorithms , 1981 .

[11]  B. Everitt,et al.  Cluster Analysis (2nd ed). , 1982 .

[12]  C. L. Mallows,et al.  A Method for Comparing Two Hierarchical Clusterings: Rejoinder , 1983 .

[13]  G. W. Milligan,et al.  The Effect of Cluster Size, Dimensionality, and the Number of Clusters on Recovery of True Cluster Structure , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  C. Mallows,et al.  A Method for Comparing Two Hierarchical Clusterings , 1983 .

[15]  J. Carroll,et al.  Synthesized clustering: A method for amalgamating alternative clustering bases with differential weighting of variables , 1984 .

[16]  H. Charles Romesburg,et al.  Cluster analysis for researchers , 1984 .

[17]  W. DeSarbo,et al.  Optimal variable weighting for hierarchical clustering: An alternating least-squares algorithm , 1985 .

[18]  G. W. Milligan,et al.  An algorithm for generating artificial test clusters , 1985 .

[19]  Alternating Least Squares Optimal Variable Weighting Algorithms for Ultrametric and Additive Tree Representations , 1986 .

[20]  G. W. Milligan,et al.  A Study of the Comparability of External Criteria for Hierarchical Cluster Analysis. , 1986, Multivariate behavioral research.

[21]  G. Soete Optimal variable weighting for ultrametric and additive tree clustering , 1986 .

[22]  Weighted Standardization—A General Data Transformation Method Proceeding Classification Procedures , 1986 .

[23]  A. D. Gordon A Review of Hierarchical Classification , 1987 .

[24]  E. Fowlkes,et al.  Variable selection in clustering , 1988 .

[25]  G. W. Milligan,et al.  A study of standardization of variables in cluster analysis , 1988 .

[26]  G. Soete OVWTRE: A program for optimal variable weighting for ultrametric and additive tree fitting , 1988 .

[27]  G. W. Milligan,et al.  A Study of the Beta-Flexible Clustering Method. , 1989, Multivariate behavioral research.