DISSIM: a program for the analysis of chemical diversity.

As interest in database searching and compound selection has grown, there has been a concomitant growth in interest in the quantification of chemical similarity. Described here is a computer program called DISSIM, which addresses the problem of selecting diverse subsets from larger collections of chemical compounds. It is a pragmatic solution combining a maximum dissimilarity search algorithm and a general multidimensional measure of chemical similarity based on the combination of different molecular descriptors. The problem of correlation between descriptors is addressed and appropriate schemes for weighting and normalisation are described. The specific application of these techniques to the comparative analysis of topological indices and their use in the area of chemical diversity analysis and compound selection are also described.

[1]  G. S. Johnson,et al.  An Information-Intensive Approach to the Molecular Pharmacology of Cancer , 1997, Science.

[2]  F. Burden A CHEMICALLY INTUITIVE MOLECULAR INDEX BASED ON THE EIGENVALUES OF A MODIFIED ADJACENCY MATRIX , 1997 .

[3]  Edward E. Hodgkin,et al.  Molecular similarity based on electrostatic potential and electric field , 1987 .

[4]  Danail Bonchev,et al.  Information theoretic indices for characterization of chemical structures , 1983 .

[5]  Alain Guénoche,et al.  Trees and proximity representations , 1991, Wiley-Interscience series in discrete mathematics and optimization.

[6]  M Vingron,et al.  Weighting in sequence space: a comparison of methods in terms of generalized sequences. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Darren R. Flower,et al.  On the Properties of Bit String-Based Measures of Chemical Similarity , 1998, J. Chem. Inf. Comput. Sci..

[8]  P. Argos,et al.  Weighting aligned protein or nucleic acid sequences to correct for unequal representation. , 1990, Journal of molecular biology.

[9]  H. Wiener Structural determination of paraffin boiling points. , 1947, Journal of the American Chemical Society.

[10]  W R Taylor,et al.  Deriving an amino acid distance matrix. , 1993, Journal of theoretical biology.

[11]  Robert P. Sheridan,et al.  Chemical Similarity Using Physiochemical Property Descriptors , 1996, J. Chem. Inf. Comput. Sci..

[12]  Gerald J. Niemi,et al.  Predicting properties of molecules using graph invariants , 1991 .

[13]  H. Whitney A Set of Topological Invariants for Graphs , 1933 .

[14]  Martin Vingron,et al.  A fast and sensitive multiple sequence alignment algorithm , 1989, Comput. Appl. Biosci..

[15]  Iain M. McLay,et al.  Similarity Measures for Rational Set Selection and Analysis of Combinatorial Libraries: The Diverse Property-Derived (DPD) Approach , 1997, Journal of chemical information and computer sciences.

[16]  S. Henikoff,et al.  Position-based sequence weights. , 1994, Journal of molecular biology.

[17]  D R Flower,et al.  ALTER: eclectic management of molecular structure data. , 1997, Journal of molecular graphics & modelling.

[18]  Robert D. Clark,et al.  OptiSim: An Extended Dissimilarity Selection Method for Finding Diverse Representative Subsets , 1997, J. Chem. Inf. Comput. Sci..

[19]  Yves Van de Peer,et al.  TREECON for Windows: a software package for the construction and drawing of evolutionary trees for the Microsoft Windows environment , 1994, Comput. Appl. Biosci..

[20]  F E Blaney,et al.  Molecular surface comparison. 2. Similarity of electrostatic vector fields in drug design. , 1995, Journal of molecular graphics.

[21]  L. Hall,et al.  Molecular connectivity in chemistry and drug research , 1976 .

[22]  P. Willett,et al.  A Fast Algorithm For Selecting Sets Of Dissimilar Molecules From Large Chemical Databases , 1995 .

[23]  D C Spellmeyer,et al.  Measuring diversity: experimental design of combinatorial libraries for drug discovery. , 1995, Journal of medicinal chemistry.

[24]  Bojan Mohar,et al.  The Quasi-Wiener and the Kirchhoff Indices Coincide , 1996, J. Chem. Inf. Comput. Sci..

[25]  Roderic D. M. Page,et al.  TreeView: an application to display phylogenetic trees on personal computers , 1996, Comput. Appl. Biosci..

[26]  David M. Rocke,et al.  Predicting ligand binding to proteins by affinity fingerprinting. , 1995, Chemistry & biology.

[27]  Ivan Gutman,et al.  Algebraic Connections between Topological Indices , 1998, J. Chem. Inf. Comput. Sci..

[28]  Yvonne C. Martin,et al.  The Information Content of 2D and 3D Structural Descriptors Relevant to Ligand-Receptor Binding , 1997, J. Chem. Inf. Comput. Sci..

[29]  F. Lombardo,et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings , 1997 .

[30]  S. Unger Molecular Connectivity in Structure–activity Analysis , 1987 .

[31]  J. Wootton,et al.  Construction of validated, non-redundant composite protein sequence databases. , 1990, Protein engineering.

[32]  J. Mason,et al.  New perspectives in lead generation II: Evaluating molecular diversity , 1996 .

[33]  P Willett,et al.  Comparison of algorithms for dissimilarity-based compound selection. , 1997, Journal of molecular graphics & modelling.

[34]  Anders Krogh,et al.  Maximum Entropy Weighting of Aligned Sequences of Proteins or DNA , 1995, ISMB.

[35]  G. Rishton Reactive compounds and in vitro false positives in HTS , 1997 .

[36]  Sandi Klavzar,et al.  A Comparison of the Schultz Molecular Topological Index with the Wiener Index , 1996, J. Chem. Inf. Comput. Sci..

[37]  Ekaterina Gordeeva,et al.  Traditional topological indexes vs electronic, geometrical, and combined molecular descriptors in QSAR/QSPR research , 1993, J. Chem. Inf. Comput. Sci..

[38]  Robert D Clark,et al.  Neighborhood behavior: a useful concept for validation of "molecular diversity" descriptors. , 1996, Journal of medicinal chemistry.

[39]  M. Volkenstein,et al.  Statistical mechanics of chain molecules , 1969 .

[40]  S F Altschul,et al.  Weights for data related by a tree. , 1989, Journal of molecular biology.

[41]  G. Olsen,et al.  The ribosomal RNA database project. , 1991, Nucleic acids research.