Similarity Search Profiling Reveals Effects of Fingerprint Scaling in Virtual Screening

Fingerprint scaling is a method to increase the performance of similarity search calculations. It is based on the detection of bit patterns in keyed fingerprints that are signatures of specific compound classes. Application of scaling factors to consensus bits that are mostly set on emphasizes signature bit patterns during similarity searching and has been shown to improve search results for different fingerprints. Similarity search profiling has recently been introduced as a method to analyze similarity search calculations. Profiles separately monitor correctly identified hits and other detected database compounds as a function of similarity threshold values and make it possible to estimate whether virtual screening calculations can be successful or to evaluate why they fail. This similarity search profile technique has been applied here to study fingerprint scaling in detail and better understand effects that are responsible for its performance. In particular, we have focused on the qualitative and quantitative analysis of similarity search profiles under scaling conditions. Therefore, we have carried out systematic similarity search calculations for 23 biological activity classes under scaling conditions over a wide range of scaling factors in a compound database containing approximately 1.3 million molecules and monitored these calculations in similarity search profiles. Analysis of these profiles confirmed increases in hit rates as a consequence of scaling and revealed that scaling influences similarity search calculations in different ways. Based on scaled similarity search profiles, compound sets could be divided into different categories. In a number of cases, increases in search performance under scaling conditions were due to a more significant relative increase in correctly identified hits than detected false-positives. This was also consistent with the finding that preferred similarity threshold values increased due to fingerprint scaling, which was well illustrated by similarity search profiling.

[1]  David Weininger,et al.  Stigmata: An Algorithm To Determine Structural Commonalities in Diverse Datasets , 1996, J. Chem. Inf. Comput. Sci..

[2]  Jürgen Bajorath,et al.  Fingerprint Scaling Increases the Probability of Identifying Molecules with Similar Activity in Virtual Screening Calculations , 2001, J. Chem. Inf. Comput. Sci..

[3]  C E Berkoff,et al.  Substructural analysis. A novel approach to the problem of drug design. , 1974, Journal of medicinal chemistry.

[4]  Shaomeng Wang,et al.  How Does Consensus Scoring Work for Virtual Library Screening? An Idealized Computer Experiment , 2001, J. Chem. Inf. Comput. Sci..

[5]  Jürgen Bajorath,et al.  Selected Concepts and Investigations in Compound Classification, Molecular Descriptor Analysis, and Virtual Screening , 2001, J. Chem. Inf. Comput. Sci..

[6]  Naomie Salim,et al.  Combination of Fingerprint-Based Similarity Coefficients Using Data Fusion , 2003, J. Chem. Inf. Comput. Sci..

[7]  John Bradshaw,et al.  Similarity Searching Using Reduced Graphs , 2003, J. Chem. Inf. Comput. Sci..

[8]  P. Willett,et al.  Effectiveness of retrieval in similarity searches of chemical databases: a review of performance measures. , 2000, Journal of molecular graphics & modelling.

[9]  Christophe Cleva,et al.  Chemical substructures in drug discovery. , 2003, Drug discovery today.

[10]  John Bradshaw,et al.  Identification of Biological Activity Profiles Using Substructural Analysis and Genetic Algorithms , 1998, J. Chem. Inf. Comput. Sci..

[11]  P Willett,et al.  Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings. , 2002, Combinatorial chemistry & high throughput screening.

[12]  Jürgen Bajorath,et al.  Profile Scaling Increases the Similarity Search Performance of Molecular Fingerprints Containing Numerical Descriptors and Structural Keys , 2003, J. Chem. Inf. Comput. Sci..

[13]  Jürgen Bajorath,et al.  Design and Evaluation of a Molecular Fingerprint Involving the Transformation of Property Descriptor Values into a Binary Classification Scheme , 2003, J. Chem. Inf. Comput. Sci..

[14]  Jürgen Bajorath,et al.  Integration of virtual and high-throughput screening , 2002, Nature Reviews Drug Discovery.

[15]  Y. Martin,et al.  Do structurally similar molecules have similar biological activity? , 2002, Journal of medicinal chemistry.

[16]  Robert P Sheridan,et al.  Why do we need so many chemical similarity search methods? , 2002, Drug discovery today.

[17]  Y. Martin,et al.  3D database searching in drug design. , 1992, Journal of medicinal chemistry.

[18]  Jürgen Bajorath,et al.  Recursive Median Partitioning for Virtual Screening of Large Databases , 2003, J. Chem. Inf. Comput. Sci..

[19]  Jürgen Bajorath,et al.  Similarity Search Profiles as a Diagnostic Tool for the Analysis of Virtual Screening Calculations , 2004, J. Chem. Inf. Model..