Optimal variable weighting for hierarchical clustering: An alternating least-squares algorithm

This paper presents the development of a new methodology which simultaneously estimates in a least-squares fashion both an ultrametric tree and respective variable weightings for profile data that have been converted into (weighted) Euclidean distances. We first review the relevant classification literature on this topic. The new methodology is presented including the alternating least-squares algorithm used to estimate the parameters. The method is applied to a synthetic data set with known structure as a test of its operation. An application of this new methodology to ethnic group rating data is also discussed. Finally, extensions of the procedure to model additive, multiple, and three-way trees are mentioned.

[1]  R. Courant Differential and Integral Calculus , 1935 .

[2]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[3]  J. Hartigan REPRESENTATION OF SIMILARITY MATRICES BY TREES , 1967 .

[4]  H. P. Friedman,et al.  On Some Invariant Criteria for Grouping Data , 1967 .

[5]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[6]  F. Rohlf Adaptive Hierarchical Clustering Schemes , 1970 .

[7]  J. Farris Estimating Phylogenetic Trees from Distance Matrices , 1972, The American Naturalist.

[8]  E. Holman The relation between hierarchical and euclidean models for psychological distances , 1972 .

[9]  Herman Chernoff,et al.  Metric considerations in cluster analysis , 1972 .

[10]  S. Hakimi,et al.  The distance matrix of a graph and its tree realization , 1972 .

[11]  R. Maronna,et al.  Multivariate Clustering Procedures with Variable Metrics , 1974 .

[12]  P. Buneman A Note on the Metric Properties of Trees , 1974 .

[13]  Forrest W. Young,et al.  The Perceived Structure of American Ethnic Groups: The Use of Multidimensional Scaling in Stereotype Research , 1974 .

[14]  A. Dobson Unrooted trees for numerical taxonomy , 1974, Journal of Applied Probability.

[15]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[16]  Brian Everitt,et al.  Cluster analysis , 1974 .

[17]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[18]  J. Carroll,et al.  Spatial, non-spatial and hybrid models for scaling , 1976 .

[19]  Sandra G. Funk The Perceived Structure of American Ethnic Groups: The Use of Multidimensional Scaling in Stereotype Research , 1976 .

[20]  P. Hogeweg,et al.  Iterative character weighing in numerical taxonomy. , 1976, Computers in biology and medicine.

[21]  M. J. D. Powell,et al.  Restart procedures for the conjugate gradient method , 1977, Math. Program..

[22]  J. Kruskal The Relationship between Multidimensional Scaling and Clustering , 1977 .

[23]  A. Tversky,et al.  Additive similarity trees , 1977 .

[24]  J. Cunningham,et al.  Free trees and bidirectional trees as representations of psychological distance , 1978 .

[25]  G. W. Milligan,et al.  An examination of the effect of six types of error perturbation on fifteen clustering algorithms , 1980 .

[26]  J. Chandon,et al.  Construction de l'ultramétrique la plus proche d'une dissimilarité au sens des moindres carrés , 1980 .

[27]  Philip E. Gill,et al.  Practical optimization , 1981 .

[28]  G. Soete A least squares algorithm for fitting additive trees to proximity data , 1983 .

[29]  M. Jambu,et al.  Cluster analysis and data analysis , 1985 .

[30]  Wayne S. DeSarbo,et al.  Constrained classification: The use of a priori information in cluster analysis , 1984 .

[31]  V. Rao,et al.  GENFOLD2: A set of models and algorithms for the general UnFOLDing analysis of preference/dominance data , 1984 .

[32]  W. DeSarbo,et al.  The representation of three-way proximity data by single and multiple tree structure models , 1984 .

[33]  G. Soete Ultrametric tree representations of incomplete dissimilarity data , 1984 .

[34]  J. Carroll,et al.  Synthesized clustering: A method for amalgamating alternative clustering bases with differential weighting of variables , 1984 .

[35]  Geert De Soete,et al.  A least squares algorithm for fitting an ultrametric tree to a dissimilarity matrix , 1984, Pattern Recognit. Lett..

[36]  G. Soete Optimal variable weighting for ultrametric and additive tree clustering , 1986 .