Efficient algorithms for agglomerative hierarchical clustering methods

Whenevern objects are characterized by a matrix of pairwise dissimilarities, they may be clustered by any of a number of sequential, agglomerative, hierarchical, nonoverlapping (SAHN) clustering methods. These SAHN clustering methods are defined by a paradigmatic algorithm that usually requires 0(n3) time, in the worst case, to cluster the objects. An improved algorithm (Anderberg 1973), while still requiring 0(n3) worst-case time, can reasonably be expected to exhibit 0(n2) expected behavior. By contrast, we describe a SAHN clustering algorithm that requires 0(n2 logn) time in the worst case. When SAHN clustering methods exhibit reasonable space distortion properties, further improvements are possible. We adapt a SAHN clustering algorithm, based on the efficient construction of nearest neighbor chains, to obtain a reasonably general SAHN clustering algorithm that requires in the worst case 0(n2) time and space.Whenevern objects are characterized byk-tuples of real numbers, they may be clustered by any of a family of centroid SAHN clustering methods. These methods are based on a geometric model in which clusters are represented by points ink-dimensional real space and points being agglomerated are replaced by a single (centroid) point. For this model, we have solved a class of special packing problems involving point-symmetric convex objects and have exploited it to design an efficient centroid clustering algorithm. Specifically, we describe a centroid SAHN clustering algorithm that requires 0(n2) time, in the worst case, for fixedk and for a family of dissimilarity measures including the Manhattan, Euclidean, Chebychev and all other Minkowski metrics.

[1]  H. Hadwiger,et al.  Über Treffanzahlen bei translationsgleichen Eikörpern , 1957 .

[2]  Hugo Hadwiger,et al.  Kombinatorische Geometrie in der Ebene , 1959 .

[3]  Helmut Groemer,et al.  Abschätzungen für die Anzahl der konvexen Körper, die einen konvexen Körper berühren , 1961 .

[4]  B. Grünbaum ON A CONJECTURE OF H. HADWIGER , 1961 .

[5]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[6]  W. T. Williams,et al.  A Generalized Sorting Strategy for Computer Classifications , 1966, Nature.

[7]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[8]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[9]  J. Gower,et al.  Minimum Spanning Trees and Single Linkage Cluster Analysis , 1969 .

[10]  G. J. S. Ross,et al.  Algorithm as 15: Single Linkage Cluster Analysis , 1969 .

[11]  David Wishart,et al.  256 NOTE: An Algorithm for Hierarchical Classifications , 1969 .

[12]  R. M. Cormack,et al.  A Review of Classification , 1971 .

[13]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[14]  Robin Sibson,et al.  SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method , 1973, Comput. J..

[15]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[16]  Brian Everitt,et al.  Cluster analysis , 1974 .

[17]  Michael Ian Shamos,et al.  Closest-point problems , 1975, 16th Annual Symposium on Foundations of Computer Science (sfcs 1975).

[18]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[19]  L. Hubert,et al.  Hierarchical Clustering and the Concept of Space Distortion. , 1975 .

[20]  D. Defays,et al.  An Efficient Algorithm for a Complete Link Method , 1977, Comput. J..

[21]  Bruce W. Weide,et al.  A Survey of Analysis Techniques for Discrete Algorithms , 1977, CSUR.

[22]  M. Bruynooghe,et al.  Classification ascendante hiérarchique des grands ensembles de données : un algorithme rapide fondé sur la construction des voisinages réductibles , 1978 .

[23]  F. James Rohlf,et al.  A Probabilistic Minimum Spanning Tree Algorithm , 1978, Inf. Process. Lett..

[24]  G. Milligan Ultrametric hierarchical clustering algorithms , 1979 .

[25]  Frank K. Hwang,et al.  An O(n log n) Algorithm for Rectilinear Minimal Spanning Trees , 1979, JACM.

[26]  W. Warde,et al.  A mathematical comparison of the members of an infinite family of agglomerative clustering algorithms , 1979 .

[27]  Vladimir Batagelj,et al.  Note on ultrametric hierarchical clustering algorithms , 1981 .

[28]  J. Juan Programme de classification hiérarchique par l'algorithme de la recherche en chaîne des voisins réciproques , 1982 .

[29]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..