Do Not Hesitate to Use Tversky - and Other Hints for Successful Active Analogue Searches with Feature Count Descriptors

This study is an exhaustive analysis of the neighborhood behavior over a large coherent data set (ChEMBL target/ligand pairs of known Ki, for 165 targets with >50 associated ligands each). It focuses on similarity-based virtual screening (SVS) success defined by the ascertained optimality index. This is a weighted compromise between purity and retrieval rate of active hits in the neighborhood of an active query. One key issue addressed here is the impact of Tversky asymmetric weighing of query vs candidate features (represented as integer-value ISIDA colored fragment/pharmacophore triplet count descriptor vectors). The nearly a 3/4 million independent SVS runs showed that Tversky scores with a strong bias in favor of query-specific features are, by far, the most successful and the least failure-prone out of a set of nine other dissimilarity scores. These include classical Tanimoto, which failed to defend its privileged status in practical SVS applications. Tversky performance is not significantly conditioned by tuning of its bias parameter α. Both initial "guesses" of α = 0.9 and 0.7 were more successful than Tanimoto (at its turn, better than Euclid). Tversky was eventually tested in exhaustive similarity searching within the library of 1.6 M commercial + bioactive molecules at http://infochim.u-strasbg.fr/webserv/VSEngine.html , comparing favorably to Tanimoto in terms of "scaffold hopping" propensity. Therefore, it should be used at least as often as, perhaps in parallel to Tanimoto in SVS. Analysis with respect to query subclasses highlighted relationships of query complexity (simply expressed in terms of pharmacophore pattern counts) and/or target nature vs SVS success likelihood. SVS using more complex queries are more robust with respect to the choice of their operational premises (descriptors, metric). Yet, they are best handled by "pro-query" Tversky scores at α > 0.5. Among simpler queries, one may distinguish between "growable" (allowing for active analogs with additional features), and a few "conservative" queries not allowing any growth. These (typically bioactive amine transporter ligands) form the specific application domain of "pro-candidate" biased Tversky scores at α < 0.5.

[1]  Dragos Horvath,et al.  Neighborhood Behavior of in Silico Structural Spaces with Respect to In Vitro Activity Spaces-A Benchmark for Neighborhood Behavior Assessment of Different in Silico Similarity Metrics , 2003, J. Chem. Inf. Comput. Sci..

[2]  Jürgen Bajorath,et al.  Structural Interpretation of Activity Cliffs Revealed by Systematic Analysis of Structure-Activity Relationships in Analog Series , 2009, J. Chem. Inf. Model..

[3]  Tudor I. Oprea,et al.  Pursuing the leadlikeness concept in pharmaceutical research. , 2004, Current opinion in chemical biology.

[4]  Robert D Clark,et al.  Neighborhood behavior: a useful concept for validation of "molecular diversity" descriptors. , 1996, Journal of medicinal chemistry.

[5]  Gerald M. Maggiora,et al.  On Outliers and Activity Cliffs-Why QSAR Often Disappoints , 2006, J. Chem. Inf. Model..

[6]  Masilamani Elizabeth Sobhia,et al.  Fingerprint Directed Scaffold Hopping for Identification of CCR2 Antagonists , 2008, J. Chem. Inf. Model..

[7]  D. Fourches,et al.  Successful “In Silico” Design of New Efficient Uranyl Binders , 2007 .

[8]  Visakan Kadirkamanathan,et al.  Analysis of Neighborhood Behavior in Lead Optimization and Array Design , 2009, J. Chem. Inf. Model..

[9]  Alexandre Varnek,et al.  Substructural fragments: an universal language to encode reactions, molecular and supramolecular structures , 2005, J. Comput. Aided Mol. Des..

[10]  Dragos Horvath,et al.  Using self-organizing maps to accelerate similarity search. , 2012, Bioorganic & medicinal chemistry.

[11]  A. Tversky Features of Similarity , 1977 .

[12]  Anna Linusson,et al.  SHOP: scaffold HOPping by GRID-based similarity searches. , 2007, Journal of medicinal chemistry.

[13]  Yvonne C. Martin,et al.  Application of Belief Theory to Similarity Data Fusion for Use in Analog Searching and Lead Hopping , 2008, J. Chem. Inf. Model..

[14]  Gilles Marcou,et al.  Mining Chemical Reactions Using Neighborhood Behavior and Condensed Graphs of Reactions Approaches , 2012, J. Chem. Inf. Model..

[15]  D. Horvath,et al.  ISIDA Property‐Labelled Fragment Descriptors , 2010, Molecular informatics.

[16]  Hans Matter,et al.  Comparing 3D Pharmacophore Triplets and 2D Fingerprints for Selecting Diverse Compound Subsets , 1999, J. Chem. Inf. Comput. Sci..

[17]  J. Proudfoot Drugs, leads, and drug-likeness: an analysis of some recently launched drugs. , 2002, Bioorganic & medicinal chemistry letters.

[18]  Gilles Marcou,et al.  Local neighborhood behavior in a combinatorial library context , 2011, J. Comput. Aided Mol. Des..

[19]  Jürgen Bajorath,et al.  Balancing the Influence of Molecular Complexity on Fingerprint Similarity Searching , 2008, J. Chem. Inf. Model..

[20]  Stefan Senger,et al.  Using Tversky Similarity Searches for Core Hopping: Finding the Needles in the Haystack , 2009, J. Chem. Inf. Model..

[21]  Xin Chen,et al.  Asymmetry of Chemical Similarity , 2007, ChemMedChem.

[22]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[23]  Gisbert Schneider,et al.  Scaffold‐Hopping: How Far Can You Jump? , 2006 .

[24]  Dragos Horvath,et al.  Fuzzy Tricentric Pharmacophore Fingerprints. 2. Application of Topological Fuzzy Pharmacophore Triplets in Quantitative Structure-Activity Relationships , 2008, J. Chem. Inf. Model..

[25]  Dragos Horvath,et al.  Neighborhood Behavior of in Silico Structural Spaces with Respect to in Vitro Activity Spaces-A Novel Understanding of the Molecular Similarity Principle in the Context of Multiple Receptor Binding Profiles , 2003, J. Chem. Inf. Comput. Sci..

[26]  J. Mason,et al.  New 4-point pharmacophore method for molecular similarity and diversity applications: overview of the method and applications, including a novel approach to the design of combinatorial libraries containing privileged substructures. , 1999, Journal of medicinal chemistry.

[27]  Jürgen Bajorath,et al.  Development of a Compound Class-Directed Similarity Coefficient That Accounts for Molecular Complexity Effects in Fingerprint Searching , 2009, J. Chem. Inf. Model..

[28]  Y. Martin,et al.  Do structurally similar molecules have similar biological activity? , 2002, Journal of medicinal chemistry.

[29]  Robert P Sheridan,et al.  Chemical similarity searches: when is complexity justified? , 2007, Expert opinion on drug discovery.

[30]  Rajarshi Guha,et al.  Structure-Activity Landscape Index: Identifying and Quantifying Activity Cliffs , 2008, J. Chem. Inf. Model..

[31]  Petra Schneider,et al.  Scaffold Hopping by “Fuzzy” Pharmacophores and its Application to RNA Targets , 2007, Chembiochem : a European journal of chemical biology.

[32]  Eugen Lounkine,et al.  Improving the Search Performance of Extended Connectivity Fingerprints through Activity‐Oriented Feature Filtering and Application of a Bit‐Density‐Dependent Similarity Function , 2009, ChemMedChem.

[33]  Dragos Horvath,et al.  Molecular similarity and property similarity. , 2004, Current topics in medicinal chemistry.

[34]  Jürgen Bajorath,et al.  Apparent Asymmetry in Fingerprint Similarity Searching is a Direct Consequence of Differences in Bit Densities and Molecular Size , 2007, ChemMedChem.

[35]  Benjamin Parent,et al.  Fuzzy Tricentric Pharmacophore Fingerprints, 1. Topological Fuzzy Pharmacophore Triplets and Adapted Molecular Similarity Scoring Schemes , 2006, J. Chem. Inf. Model..

[36]  Stephen D. Pickett,et al.  Diversity Profiling and Design Using 3D Pharmacophores: Pharmacophore-Derived Queries (PDQ) , 1996, J. Chem. Inf. Comput. Sci..

[37]  I. Tetko,et al.  ISIDA - Platform for Virtual Screening Based on Fragment and Pharmacophoric Descriptors , 2008 .