On the Properties of Bit String-Based Measures of Chemical Similarity

With the growth of interest in database searching and compound selection, the quantification of chemical similarity has become an area of intense practical and theoretical interest. One of the most widely used methods of measuring chemical similarity is based on mapping fragments within a molecule as bits within a binary string. We present empirical results which suggest that bit strings provide a nonintuitive encoding of molecular size, shape, and global similarity. Other results, this time statistical in nature, suggest that the observed behavior of bit string-based searches have a large nonspecific component. On this basis, we question whether bit string-based similarity methods possess all the features desirable in a quantitative chemical distance measure or metric and suggest that there are instances when they may not be the most appropriate tool for searching or segregating chemical structures.

[1]  J. Mason,et al.  New perspectives in lead generation II: Evaluating molecular diversity , 1996 .

[2]  Robert P. Sheridan,et al.  Chemical Similarity Using Physiochemical Property Descriptors , 1996, J. Chem. Inf. Comput. Sci..

[3]  D C Spellmeyer,et al.  Measuring diversity: experimental design of combinatorial libraries for drug discovery. , 1995, Journal of medicinal chemistry.

[4]  G. S. Johnson,et al.  An Information-Intensive Approach to the Molecular Pharmacology of Cancer , 1997, Science.

[5]  Robert D Clark,et al.  Neighborhood behavior: a useful concept for validation of "molecular diversity" descriptors. , 1996, Journal of medicinal chemistry.

[6]  Edward E. Hodgkin,et al.  Molecular similarity based on electrostatic potential and electric field , 1987 .

[7]  Yvonne C. Martin,et al.  Use of Structure-Activity Data To Compare Structure-Based Clustering Methods and Descriptors for Use in Compound Selection , 1996, J. Chem. Inf. Comput. Sci..

[8]  F E Blaney,et al.  Molecular surface comparison. 2. Similarity of electrostatic vector fields in drug design. , 1995, Journal of molecular graphics.

[9]  P. Argos,et al.  An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. , 1995, Journal of molecular biology.

[10]  K I Shine Continuous Innovation in Health , 1997, Science.

[11]  P. Willett,et al.  A Fast Algorithm For Selecting Sets Of Dissimilar Molecules From Large Chemical Databases , 1995 .

[12]  D R Flower,et al.  Structural relationship of streptavidin to the calycin protein superfamily , 1993, FEBS letters.

[13]  Subhash C. Basak,et al.  Molecular Similarity and Estimation of Molecular Properties , 1995, J. Chem. Inf. Comput. Sci..

[14]  Yvonne C. Martin,et al.  The Information Content of 2D and 3D Structural Descriptors Relevant to Ligand-Receptor Binding , 1997, J. Chem. Inf. Comput. Sci..

[15]  David M. Rocke,et al.  Predicting ligand binding to proteins by affinity fingerprinting. , 1995, Chemistry & biology.

[16]  Robert P. Sheridan,et al.  Chemical Similarity Using Geometric Atom Pair Descriptors , 1996, J. Chem. Inf. Comput. Sci..

[17]  T. Attwood,et al.  Structure and sequence relationships in the lipocalins and related proteins , 1993, Protein science : a publication of the Protein Society.