Applying genetic algorithms to search for the best hierarchical clustering of a dataset

Abstract We apply genetic algorithms to get the optimal, in a least squares sense, hierarchical clustering of a dataset. We base this on the bijection between the set of hierarchical classifications of a dataset and the set of ultrametric distances. This bijection makes it possible to measure how good a hierarchical classification is, by calculating the L 2 norm between the ultrametric distance matrix associated with the hierarchical classification and the proximity matrix of the dataset. Our results are shown to improve on other methods which have been proposed, based on the ultrametric.

[1]  Geert De Soete,et al.  A least squares algorithm for fitting an ultrametric tree to a dissimilarity matrix , 1984, Pattern Recognit. Lett..

[2]  Paulien Hogeweg,et al.  Redundant Coding of an NP-Complete Problem Allows Effective Genetic Algorithm Search , 1990, PPSN.

[3]  Y. Escoufier,et al.  Analyse Typologique. Theories et Applications , 1982 .

[4]  J. Chandon,et al.  Construction de l'ultramétrique la plus proche d'une dissimilarité au sens des moindres carrés , 1980 .

[5]  O. Gascuel,et al.  A reduction algorithm for approximating a (nonmetric) dissimilarity by a tree distance , 1996 .

[6]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[7]  A. D. Gordon,et al.  Classification : Methods for the Exploratory Analysis of Multivariate Data , 1981 .

[8]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[9]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[10]  Darrell Whitley,et al.  Genitor: a different genetic algorithm , 1988 .

[11]  Peter A. Lachenbruch,et al.  Classification: Methods for the Exploratory Analythi of Multivariate Data , 1982 .

[12]  R. Shepard,et al.  The internal representation of numbers , 1975, Cognitive Psychology.

[13]  G. De Soete,et al.  Clustering and Classification , 2019, Data-Driven Science and Engineering.

[14]  C. J. Jardine,et al.  The structure and construction of taxonomic hierarchies , 1967 .

[15]  J. Hartigan REPRESENTATION OF SIMILARITY MATRICES BY TREES , 1967 .

[16]  Babu O. Narayanan,et al.  On the approximability of numerical taxonomy , 1996 .

[17]  John Holland,et al.  Adaptation in Natural and Artificial Sys-tems: An Introductory Analysis with Applications to Biology , 1975 .

[18]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[19]  Calyampudi R. Rao,et al.  Advanced Statistical Methods in Biometric Research. , 1953 .

[20]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[21]  J. Hartigan Statistical theory in clustering , 1985 .

[22]  B. Mellers,et al.  Similarity and Choice. , 1994 .

[23]  Calyampudi R. Rao,et al.  Advanced Statistical Methods in Biometric Research. , 1953 .

[24]  L. Hubert,et al.  Iterative projection strategies for the least-squares fitting of tree structures to proximity data , 1995 .