Database Searching for Compounds with Similar Biological Activity Using Short Binary Bit String Representations of Molecules

In an effort to identify biologically active molecules in compound databases, we have investigated similarity searching using short binary bit strings with a maximum of 54 bit positions. These "minifingerprints" (MFPs) were designed to account for the presence or absence of structural fragments and/or aromatic character, flexibility, and hydrogen-bonding capacity of molecules. MFP design was based on an analysis of distributions of molecular descriptors and structural fragments in two large compound collections. The performance of different MFPs and a reference fingerprint was tested by systematic "one-against-all" similarity searches of molecules in a database containing 364 compounds with different biological activities. For each fingerprint, the most effective similarity cutoff value was determined. An MFP accounting for only 32 structural fragments showed less than 2% false positive similarity matches and correctly assigned on average approximately 40% of the compounds with the same biological activity to a query molecule. Inclusion of three numerical two-dimensional (2D) molecular descriptors increased the performance by 15%. This MFP performed better than a complex 2D fingerprint. At a similarity cutoff value of 0.85, the 2D fingerprint totally eliminated false positives but recognized less than 10% of the compounds within the same activity class.

[1]  Thomas R. Hagadone,et al.  Molecular substructure similarity searching: efficient retrieval in two-dimensional structure databases , 1992, J. Chem. Inf. Comput. Sci..

[2]  H. Matter,et al.  Selecting optimally diverse compounds from structure databases: a validation study of two-dimensional and three-dimensional molecular descriptors. , 1997, Journal of medicinal chemistry.

[3]  R. Babine,et al.  MOLECULAR RECOGNITION OF PROTEIN-LIGAND COMPLEXES : APPLICATIONS TO DRUG DESIGN , 1997 .

[4]  Jonathan A. Ellman,et al.  Synthesis and Applications of Small Molecule Libraries. , 1996, Chemical reviews.

[5]  Eric J. Lien,et al.  Physical Factors Contributing to Hydrophobic Constant π , 1986 .

[6]  M. Charton,et al.  The structural dependence of amino acid hydrophobicity parameters. , 1982, Journal of theoretical biology.

[7]  John Bradshaw,et al.  Identification of Biological Activity Profiles Using Substructural Analysis and Genetic Algorithms , 1998, J. Chem. Inf. Comput. Sci..

[8]  Ajay,et al.  Can we learn to distinguish between "drug-like" and "nondrug-like" molecules? , 1998, Journal of medicinal chemistry.

[9]  Ian A. Watson,et al.  Experimental Designs for Selecting Molecules from Large Chemical Databases , 1997, J. Chem. Inf. Comput. Sci..

[10]  Yvonne C. Martin,et al.  The Information Content of 2D and 3D Structural Descriptors Relevant to Ligand-Receptor Binding , 1997, J. Chem. Inf. Comput. Sci..

[11]  Yvonne C. Martin,et al.  Use of Structure-Activity Data To Compare Structure-Based Clustering Methods and Descriptors for Use in Compound Selection , 1996, J. Chem. Inf. Comput. Sci..

[12]  T. Fujita,et al.  Hydrogen-bonding parameter and its significance in quantitative structure--activity studies. , 1977, Journal of medicinal chemistry.

[13]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[14]  Robert D Clark,et al.  Neighborhood behavior: a useful concept for validation of "molecular diversity" descriptors. , 1996, Journal of medicinal chemistry.

[15]  Darren R. Flower,et al.  On the Properties of Bit String-Based Measures of Chemical Similarity , 1998, J. Chem. Inf. Comput. Sci..

[16]  R. Whistler,et al.  Preparation and antitumor activity of 4'-thio analogs of 2,2'-anhydro-1-beta-D-arabinofuranosylcytosine. , 1974, Journal of medicinal chemistry.

[17]  Robert D. Clark,et al.  Virtual Compound Libraries: A New Approach to Decision Making in Molecular Discovery Research , 1998, J. Chem. Inf. Comput. Sci..

[18]  P. Willett,et al.  A Comparison of Some Measures for the Determination of Inter‐Molecular Structural Similarity Measures of Inter‐Molecular Structural Similarity , 1986 .

[19]  Y. Martin,et al.  Computational methods in molecular diversity and combinatorial chemistry. , 1998, Current opinion in chemical biology.

[20]  Malcolm J. McGregor,et al.  Clustering of Large Databases of Compounds: Using the MDL "Keys" as Structural Descriptors , 1997, J. Chem. Inf. Comput. Sci..

[21]  H. Kubinyi,et al.  A scoring scheme for discriminating between drugs and nondrugs. , 1998, Journal of medicinal chemistry.