Evaluation of Similarity Measures for Searching the Dictionary of Natural Products Database

Similarity searches using combinations of seven different similarity coefficients and six different representations have been carried out on the Dictionary of Natural Products database. The objective was to discover if any special methods of searching apply to this database, which is very different in nature from the many synthetic databases that have been the subject of previous studies of similarity searching. Search effectiveness was assessed by a recall analysis of the search outputs from sets of pharmacologically active target structures. The different target sets produce exceptional but contradictory results for the Russell-Rao and Forbes coefficients, which have been shown to be due to a dependence on molecular size; these are the coefficients of choice in the case of large and small structures, respectively. Rankings from these results have been combined using a data fusion scheme and some small gains in performance were normally obtained by using substructural fingerprints and molecular holograms in combination with the Squared Euclidean or Tanimoto coefficients.

[1]  Joseph S. Verducci,et al.  A Modification of the Jaccard–Tanimoto Similarity Index for Diverse Selection of Chemical Compounds Using Binary Strings , 2002, Technometrics.

[2]  S. L. Dixon,et al.  The hidden component of size in two-dimensional fragment descriptors: side effects on sampling in bioactive libraries. , 1999, Journal of medicinal chemistry.

[3]  D J Newman,et al.  Natural products in drug discovery and development. , 1997, Journal of natural products.

[4]  Darren R. Flower,et al.  On the Properties of Bit String-Based Measures of Chemical Similarity , 1998, J. Chem. Inf. Comput. Sci..

[5]  P. Broto,et al.  Molecular structures: perception, autocorrelation descriptor and sar studies. Autocorrelation descriptor , 1984 .

[6]  Robert P. Sheridan,et al.  Protocols for Bridging the Peptide to Nonpeptide Gap in Topological Similarity Searches , 2001, J. Chem. Inf. Comput. Sci..

[7]  Jürgen Bajorath Chemoinformatics methods for systematic comparison of molecules from natural and synthetic sources and design of hybrid libraries , 2002, J. Comput. Aided Mol. Des..

[8]  Johann Gasteiger,et al.  The Coding of the Three-Dimensional Structure of Molecules by Molecular Transforms and Its Application to Structure-Spectra Correlations and Studies of Biological Activity , 1996, J. Chem. Inf. Comput. Sci..

[9]  Johannes H. Voigt,et al.  Comparison of the NCI Open Database with Seven Large Chemical Structural Databases , 2001, J. Chem. Inf. Comput. Sci..

[10]  Jing Lei,et al.  A Marine Natural Product Database , 2002, J. Chem. Inf. Comput. Sci..

[11]  A. Harvey,et al.  Strategies for discovering drugs from previously unexplored natural products. , 2000, Drug discovery today.

[12]  Lawrence Rediscovering natural product biodiversity. , 1999, Drug discovery today.

[13]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[14]  Jürgen Bajorath,et al.  Chemical Descriptors with Distinct Levels of Information Content and Varying Sensitivity to Differences between Selected Compound Databases Identified by SE-DSE Analysis , 2002, J. Chem. Inf. Comput. Sci..

[15]  M. Murcko,et al.  Consensus scoring: A method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins. , 1999, Journal of medicinal chemistry.

[16]  Jürgen Bajorath,et al.  Distinguishing between Natural Products and Synthetic Molecules by Descriptor Shannon Entropy Analysis and Binary QSAR Calculations , 2000, J. Chem. Inf. Comput. Sci..

[17]  P. Willett,et al.  Effectiveness of retrieval in similarity searches of chemical databases: a review of performance measures. , 2000, Journal of molecular graphics & modelling.

[18]  Mark A. Murcko,et al.  Virtual screening : an overview , 1998 .

[19]  G. Schneider,et al.  Scaffold architecture and pharmacophoric properties of natural products and trade drugs: application in the design of natural product-based combinatorial libraries. , 2001, Journal of combinatorial chemistry.

[20]  Thomas Henkel,et al.  Statistical Investigation into the Structural Complementarity of Natural Products and Synthetic Compounds. , 1999, Angewandte Chemie.

[21]  Shaomeng Wang,et al.  How Does Consensus Scoring Work for Virtual Library Screening? An Idealized Computer Experiment , 2001, J. Chem. Inf. Comput. Sci..