Rendering Conventional Molecular Fingerprints for Virtual Screening Independent of Molecular Complexity and Size Effects

Molecular complexity and size effects represent a known complication of fingerprint similarity searching and virtual screening that often leads to an increase in false‐positive rates and a decrease in hit rates. In standard fingerprints, differences in the complexity of reference and database molecules lead to different fingerprint bit densities, which negatively affects similarity search calculations, in particular, when fingerprints of reference molecules have higher bit density than corresponding fingerprints of database compounds. In pharmaceutical research, this is the case in many practical virtual screening applications when chemically optimized reference molecules are used. Herein we introduce an intuitive computational method to make standard fingerprints such as structural keys or pharmacophore feature fingerprints independent of molecular complexity and size effects. This is achieved by applying the concept of 'balanced codes' originating in computer science. Following this approach, binary fingerprints are transformed by incorporating the complement of their bit patterns. This straightforward transformation produces fingerprint representations with characteristic bit patterns that have exactly half of their bit positions set on, corresponding to a constant bit density of 50 % for all test compounds, regardless of their complexity and size. In similarity search calculations in the presence of complexity effects of increasing magnitude, transformed structural key and pharmacophore fingerprints display consistently better performance than their unmodified counterparts and recover active compounds in cases where the original fingerprints fail.

[1]  Naomie Salim,et al.  Analysis and Display of the Size Dependence of Chemical Similarity Coefficients , 2003, J. Chem. Inf. Comput. Sci..

[2]  Peter Ertl,et al.  Relationships between Molecular Complexity, Biological Activity, and Structural Diversity , 2006, J. Chem. Inf. Model..

[3]  Eugen Lounkine,et al.  RelACCS‐FP: A Structural Minimalist Approach to Fingerprint Design , 2008, Chemical biology & drug design.

[4]  Jürgen Bajorath,et al.  Design and Evaluation of a Novel Class-Directed 2D Fingerprint to Search for Structurally Diverse Active Compounds , 2006, J. Chem. Inf. Model..

[5]  Jérôme Hert,et al.  New Methods for Ligand-Based Virtual Screening: Use of Data Fusion and Machine Learning to Enhance the Effectiveness of Similarity Searching , 2006, J. Chem. Inf. Model..

[6]  Jürgen Bajorath,et al.  Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches. , 2007, Drug discovery today.

[7]  S. L. Dixon,et al.  The hidden component of size in two-dimensional fragment descriptors: side effects on sampling in bioactive libraries. , 1999, Journal of medicinal chemistry.

[8]  G. Bemis,et al.  The properties of known drugs. 1. Molecular frameworks. , 1996, Journal of medicinal chemistry.

[9]  Jürgen Bajorath,et al.  Development of a Fingerprint Reduction Approach for Bayesian Similarity Searching Based on Kullback-Leibler Divergence Analysis , 2009, J. Chem. Inf. Model..

[10]  Hanna Geppert,et al.  Random Reduction in Fingerprint Bit Density Improves Compound Recall in Search Calculations Using Complex Reference Molecules , 2008, Chemical biology & drug design.

[11]  Joseph S. Verducci,et al.  A Modification of the Jaccard–Tanimoto Similarity Index for Diverse Selection of Chemical Compounds Using Binary Strings , 2002, Technometrics.

[12]  Darren R. Flower,et al.  On the Properties of Bit String-Based Measures of Chemical Similarity , 1998, J. Chem. Inf. Comput. Sci..

[13]  Eugen Lounkine,et al.  Improving the Search Performance of Extended Connectivity Fingerprints through Activity‐Oriented Feature Filtering and Application of a Bit‐Density‐Dependent Similarity Function , 2009, ChemMedChem.

[14]  Jürgen Bajorath,et al.  Bit Silencing in Fingerprints Enables the Derivation of Compound Class-Directed Similarity Metrics , 2008, J. Chem. Inf. Model..

[15]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[16]  A. Tversky Features of Similarity , 1977 .

[17]  Jürgen Bajorath,et al.  Selected Concepts and Investigations in Compound Classification, Molecular Descriptor Analysis, and Virtual Screening , 2001, J. Chem. Inf. Comput. Sci..

[18]  Jürgen Bajorath,et al.  Apparent Asymmetry in Fingerprint Similarity Searching is a Direct Consequence of Differences in Bit Densities and Molecular Size , 2007, ChemMedChem.

[19]  Jürgen Bajorath,et al.  Reduction and Recombination of Fingerprints of Different Design Increase Compound Recall and the Structural Diversity of Hits , 2010, Chemical biology & drug design.

[20]  P. Willett Searching techniques for databases of two- and three-dimensional chemical structures. , 2005, Journal of medicinal chemistry.

[21]  Tom Verhoeff,et al.  Delay-insensitive codes — an overview , 1988, Distributed Computing.

[22]  Pierre Baldi,et al.  Mathematical Correction for Fingerprint Similarity Measures to Improve Chemical Retrieval , 2007, J. Chem. Inf. Model..

[23]  Jürgen Bajorath,et al.  Balancing the Influence of Molecular Complexity on Fingerprint Similarity Searching , 2008, J. Chem. Inf. Model..

[24]  Peter Willett,et al.  Similarity-based virtual screening using 2D fingerprints. , 2006, Drug discovery today.

[25]  Donald E. Knuth,et al.  Efficient balanced codes , 1986, IEEE Trans. Inf. Theory.