Random Reduction in Fingerprint Bit Density Improves Compound Recall in Search Calculations Using Complex Reference Molecules

Fingerprints are bit string representations of molecular structure and properties and widely used tools to search databases for active molecules. It is well appreciated that molecular complexity and size effects lead to systematic errors in fingerprint similarity searching. For example, different studies have highlighted the caveats associated with preferential recognition of large compounds, irrespective of their activity, when complex molecules are used as templates for fingerprint calculations. In order to systematically study complexity relationships between reference and database molecules that are relevant for practical fingerprint similarity searching, we have designed sets of active molecules of increasing fingerprint bit density relative to average database compounds and potential hits and carried out systematic similarity search trials. We find that the more complex reference molecules are, the lower the search performance becomes. However, a major result has been that random deletion of bits that are set on in fingerprints of complex reference molecules generally improves compound recall, although these random bit density reductions also cause a loss in chemical information content. These results suggest a general search strategy for fingerprints that are sensitive to complexity effects when optimized active compounds are used as reference molecules.

[1]  Jürgen Bajorath,et al.  Apparent Asymmetry in Fingerprint Similarity Searching is a Direct Consequence of Differences in Bit Densities and Molecular Size , 2007, ChemMedChem.

[2]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[3]  Joseph S. Verducci,et al.  A Modification of the Jaccard–Tanimoto Similarity Index for Diverse Selection of Chemical Compounds Using Binary Strings , 2002, Technometrics.

[4]  P. Willett Searching techniques for databases of two- and three-dimensional chemical structures. , 2005, Journal of medicinal chemistry.

[5]  J. Bajorath,et al.  Distribution of Molecular Scaffolds and R-Groups Isolated from Large Compound Databases , 1999 .

[6]  Malcolm J. McGregor,et al.  Clustering of Large Databases of Compounds: Using the MDL "Keys" as Structural Descriptors , 1997, J. Chem. Inf. Comput. Sci..

[7]  Jürgen Bajorath,et al.  Design and Evaluation of a Novel Class-Directed 2D Fingerprint to Search for Structurally Diverse Active Compounds , 2006, J. Chem. Inf. Model..

[8]  Pierre Acklin,et al.  Similarity Metrics for Ligands Reflecting the Similarity of the Target Proteins , 2003, J. Chem. Inf. Comput. Sci..

[9]  Darren R. Flower,et al.  On the Properties of Bit String-Based Measures of Chemical Similarity , 1998, J. Chem. Inf. Comput. Sci..

[10]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[11]  Peter Willett,et al.  Similarity-based virtual screening using 2D fingerprints. , 2006, Drug discovery today.

[12]  Jürgen Bajorath,et al.  Balancing the Influence of Molecular Complexity on Fingerprint Similarity Searching. , 2008 .

[13]  Jürgen Bajorath,et al.  Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches. , 2007, Drug discovery today.

[14]  Xin Chen,et al.  Asymmetry of Chemical Similarity , 2007, ChemMedChem.