Three Dissimilarity Measures to Contrast Dendrograms

We discussed three dissimilarity measures between dendrograms defined over the same set, they are triples, partition, and cluster indices. All of them decompose the dendrograms into subsets. In the case of triples and partition indices, these subsets correspond to binary partitions containing some clusters, while in the cluster index, a novel dissimilarity method introduced in this paper, the subsets are exclusively clusters. In chemical applications, the dendrograms gather clusters that contain similarity information of the data set under study. Thereby, the cluster index is the most suitable dissimilarity measure between dendrograms resulting from chemical investigation. An application example of the three measures is shown to remark upon the advantages of the cluster index over the other two methods in similarity studies. Finally, the cluster index is used to measure the differences between five dendrograms obtained when applying five common hierarchical clustering algorithms on a database of 1000 molecules.

[1]  D. Robinson Comparison of labeled trees with valency three , 1971 .

[2]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[3]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[4]  Gerta Rücker,et al.  Exploring the Limits of Graph Invariant- and Spectrum-Based Discrimination of (Sub)structures , 2002, J. Chem. Inf. Comput. Sci..

[5]  Guillermo Restrepo,et al.  Topological Space of the Chemical Elements and its Properties , 2006 .

[6]  Dieter Kratsch,et al.  On the complexity of graph reconstruction , 1991, Mathematical systems theory.

[7]  John M. Barnard,et al.  Clustering of chemical structures on the basis of two-dimensional similarity measures , 1992, J. Chem. Inf. Comput. Sci..

[8]  Dennis H. Rouvray,et al.  Definition and role of similarity concepts in the chemical and physical sciences , 1992, J. Chem. Inf. Comput. Sci..

[9]  Bernard Harris,et al.  Graph theory and its applications , 1970 .

[10]  Jiri Pospichal,et al.  Fast Evaluation of Chemical Distance by Tabu Search Algorithm , 1994, Journal of chemical information and computer sciences.

[11]  Clyde L. Monma,et al.  Tolerance graphs , 1984, Discret. Appl. Math..

[12]  W. H. Day Optimal algorithms for comparing trees with labeled leaves , 1985 .

[13]  Temple F. Smith,et al.  On the similarity of dendrograms. , 1978, Journal of theoretical biology.

[14]  B. Bollobás Combinatorics: Set Systems, Hypergraphs, Families of Vectors and Combinatorial Probability , 1986 .

[15]  Mircea V. Diudea Molecular Topology. 16. Layer Matrixes in Molecular Graphs , 1994, J. Chem. Inf. Comput. Sci..

[16]  Peter Willett,et al.  Similarity Searching in Databases of Chemical Structures , 2007 .

[17]  G. Chartrand,et al.  Graphs & Digraphs , 1986 .

[18]  Dan Gusfield,et al.  Efficient algorithms for inferring evolutionary trees , 1991, Networks.

[19]  Rainer Brüggemann,et al.  Applying the Concept of Partially Ordered Sets on the Ranking of Near-Shore Sediments by a Battery of Tests , 2001, J. Chem. Inf. Comput. Sci..

[20]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[21]  M. Steel,et al.  Distributions of Tree Comparison Metrics—Some New Results , 1993 .

[22]  Douglas B. Kell,et al.  Computational cluster validation in post-genomic data analysis , 2005, Bioinform..

[23]  Elena Deza,et al.  Dictionary of distances , 2006 .

[24]  Christos A. Nicolaou,et al.  Ties in Proximity and Clustering Compounds , 2001, J. Chem. Inf. Comput. Sci..

[25]  Rainer Brüggemann,et al.  Ranking regions through cluster analysis and posets , 2005 .

[26]  P. Kelly A congruence theorem for trees. , 1957 .

[27]  John M. Barnard,et al.  Clustering Methods and Their Uses in Computational Chemistry , 2003 .

[28]  Fred R. McMorris,et al.  COMPARISON OF UNDIRECTED PHYLOGENETIC TREES BASED ON SUBTREES OF FOUR EVOLUTIONARY UNITS , 1985 .

[29]  John M. Barnard,et al.  Chemical Fragment Generation and Clustering Software , 1997, J. Chem. Inf. Comput. Sci..

[30]  S. Boorman,et al.  Metrics on spaces of finite trees , 1973 .

[31]  Guillermo Restrepo,et al.  Quantum chemical and chemotopological study of fourth row monohydrides , 2006 .

[32]  John C. Gower,et al.  Measures of Similarity, Dissimilarity and Distance , 1985 .

[33]  Douglas E. Critchlow,et al.  THE TRIPLES DISTANCE FOR ROOTED BIFURCATING PHYLOGENETIC TREES , 1996 .

[34]  Guillermo Restrepo,et al.  Topological Study of the Periodic System , 2004, J. Chem. Inf. Model..

[35]  Michael Potter,et al.  Set theory and its philosophy , 2004 .

[36]  Joseph Felsenstein,et al.  The number of evolutionary trees , 1978 .

[37]  L. Foulds,et al.  Testing the theory of evolution by comparing phylogenetic trees constructed from five different protein sequences , 1982, Nature.

[38]  D. Penny,et al.  The Use of Tree Comparison Metrics , 1985 .

[39]  Guillermo Restrepo,et al.  From Trees (Dendrograms and Consensus Trees) to Topology , 2005 .

[40]  Guillermo Restrepo,et al.  On the Topological Sense of Chemical Sets , 2006 .

[41]  David Bawden,et al.  Comparison of hierarchical cluster analysis techniques for automatic classification of chemical structures , 1981, J. Chem. Inf. Comput. Sci..