Definitions of "Dissimilarity" for Dissimilarity-Based Compound Selection

Dissimilarity-based compound selection involves identifying a database subset in which the constituent compounds are as dissimilar to each other as possible, thus ensuring coverage of the full range of structural diversity in the original database. This paper provides a quantitative comparison of four different definitions of dissimilarity. Experiments with three different measures of diversity demonstrate that the effectiveness of the selected subset is affected by the definition of dissimilarity that is used, but that it is not possible to identify one such definition as being consistently superior to any other.

[1]  John M. Barnard,et al.  Clustering of chemical structures on the basis of two-dimensional similarity measures , 1992, J. Chem. Inf. Comput. Sci..

[2]  Dennis H. Rouvray Computational chemical graph theory , 1990 .

[3]  David Bawden,et al.  Molecular Dissimilarity in Chemical Information Systems , 1993 .

[4]  Robin Taylor,et al.  Simulation Analysis of Experimental Design Strategies for Screening Random Compounds as Potential New Drugs and Agrochemicals , 1995, J. Chem. Inf. Comput. Sci..

[5]  Roderick E. Hubbard,et al.  Characterising the geometric diversity of functional groups in chemical databases , 1995, J. Comput. Aided Mol. Des..

[6]  Robert P. Sheridan,et al.  Using a Genetic Algorithm To Suggest Combinatorial Libraries , 1995, J. Chem. Inf. Comput. Sci..

[7]  P. Willett,et al.  A Fast Algorithm For Selecting Sets Of Dissimilar Molecules From Large Chemical Databases , 1995 .

[8]  Robert C. Kohberger,et al.  Cluster Analysis (3rd ed.) , 1994 .

[9]  A. Leo CALCULATING LOG POCT FROM STRUCTURES , 1993 .

[10]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[11]  David Bawden,et al.  Comparison of hierarchical cluster analysis techniques for automatic classification of chemical structures , 1981, J. Chem. Inf. Comput. Sci..

[12]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[13]  Wendy A. Warr,et al.  Chemical Structures , 1988 .

[14]  Yvonne C. Martin,et al.  Use of Structure-Activity Data To Compare Structure-Based Clustering Methods and Descriptors for Use in Compound Selection , 1996, J. Chem. Inf. Comput. Sci..

[15]  James B. Dunbar,et al.  Enhancing the diversity of a corporate database using chemical database clustering and analysis , 1995, J. Comput. Aided Mol. Des..

[16]  Johnz Willett Similarity and Clustering in Chemical Information Systems , 1987 .

[17]  D C Spellmeyer,et al.  Measuring diversity: experimental design of combinatorial libraries for drug discovery. , 1995, Journal of medicinal chemistry.

[18]  D. E. Patterson,et al.  Designing Chemical Libraries for Lead Discovery , 1996 .

[19]  M. S. Lajiness,et al.  Molecular similarity-based methods for selecting compounds for screening , 1990 .

[20]  Han Van De Waterbeemd Advanced Computer-Assisted Techniques in Drug Discover , 1994 .

[21]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[22]  Brian Everitt,et al.  Cluster analysis , 1974 .