Comparing structural fingerprints using a literature-based similarity benchmark

BackgroundThe concept of molecular similarity is one of the central ideas in cheminformatics, despite the fact that it is ill-defined and rather difficult to assess objectively. Here we propose a practical definition of molecular similarity in the context of drug discovery: molecules A and B are similar if a medicinal chemist would be likely to synthesise and test them around the same time as part of the same medicinal chemistry program. The attraction of such a definition is that it matches one of the key uses of similarity measures in early-stage drug discovery. If we make the assumption that molecules in the same compound activity table in a medicinal chemistry paper were considered similar by the authors of the paper, we can create a dataset of similar molecules from the medicinal chemistry literature. Furthermore, molecules with decreasing levels of similarity to a reference can be found by either ordering molecules in an activity table by their activity, or by considering activity tables in different papers which have at least one molecule in common.ResultsUsing this procedure with activity data from ChEMBL, we have created two benchmark datasets for structural similarity that can be used to guide the development of improved measures. Compared to similar results from a virtual screen, these benchmarks are an order of magnitude more sensitive to differences between fingerprints both because of their size and because they avoid loss of statistical power due to the use of mean scores or ranks. We measure the performance of 28 different fingerprints on the benchmark sets and compare the results to those from the Riniker and Landrum (J Cheminf 5:26, 2013. doi:10.1186/1758-2946-5-26) ligand-based virtual screening benchmark.ConclusionsExtended-connectivity fingerprints of diameter 4 and 6 are among the best performing fingerprints when ranking diverse structures by similarity, as is the topological torsion fingerprint. However, when ranking very close analogues, the atom pair fingerprint outperforms the others tested. When ranking diverse structures or carrying out a virtual screen, we find that the performance of the ECFP fingerprints significantly improves if the bit-vector length is increased from 1024 to 16,384.Graphical abstractAn example series from one of the benchmark datasets. Each fingerprint is assessed on its ability to reproduce a specific series order.

[1]  J. Hieble,et al.  alpha-Adrenergic agents. 2. Synthesis and alpha 1-agonist activity of 2-aminotetralins. , 1982, Journal of medicinal chemistry.

[2]  Peter Willett,et al.  The use of 2D fingerprint methods to support the assessment of structural similarity in orphan drug legislation , 2014, Journal of Cheminformatics.

[3]  Peter Willett,et al.  Analysis of Data Fusion Methods in Virtual Screening: Similarity and Group Fusion , 2006, J. Chem. Inf. Model..

[4]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[5]  Robert D Clark,et al.  Neighborhood behavior: a useful concept for validation of "molecular diversity" descriptors. , 1996, Journal of medicinal chemistry.

[6]  J. Defalco,et al.  5-benzyloxytryptamine as an antagonist of TRPM8. , 2010, Bioorganic & medicinal chemistry letters.

[7]  Lazaros Mavridis,et al.  Comprehensive Comparison of Ligand-Based Virtual Screening Tools Against the DUD Data set Reveals Limitations of Current 3D Methods , 2010, J. Chem. Inf. Model..

[8]  Jürgen Bajorath,et al.  Similarity searching , 2011 .

[9]  Prasenjit Mukherjee,et al.  An overview of molecular fingerprint similarity search in virtual screening , 2016, Expert opinion on drug discovery.

[10]  Christopher I. Bayly,et al.  Evaluating Virtual Screening Methods: Good and Bad Metrics for the "Early Recognition" Problem , 2007, J. Chem. Inf. Model..

[11]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[12]  Gerald M. Maggiora,et al.  On Outliers and Activity Cliffs-Why QSAR Often Disappoints , 2006, J. Chem. Inf. Model..

[13]  M. Feth,et al.  5-Substituted 1H-pyrrolo[3,2-b]pyridines as inhibitors of gastric acid secretion. , 2008, Bioorganic & medicinal chemistry.

[14]  Pekka Tiikkainen,et al.  Critical Comparison of Virtual Screening Methods against the MUV Data Set , 2009, J. Chem. Inf. Model..

[15]  P. Willett,et al.  Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures. , 2004, Organic & biomolecular chemistry.

[16]  Dragos Horvath,et al.  Neighborhood Behavior of in Silico Structural Spaces with Respect to in Vitro Activity Spaces-A Novel Understanding of the Molecular Similarity Principle in the Context of Multiple Receptor Binding Profiles , 2003, J. Chem. Inf. Comput. Sci..

[17]  B. Roth,et al.  SAR of psilocybin analogs: discovery of a selective 5-HT 2C agonist. , 2005, Bioorganic & Medicinal Chemistry Letters.

[18]  George Papadatos,et al.  The ChEMBL bioactivity database: an update , 2013, Nucleic Acids Res..

[19]  E. Demont,et al.  Evaluation of basic, heterocyclic ring systems as templates for use as potassium competitive acid blockers (pCABs). , 2009, Bioorganic & medicinal chemistry letters.

[20]  Visakan Kadirkamanathan,et al.  Analysis of Neighborhood Behavior in Lead Optimization and Array Design , 2009, J. Chem. Inf. Model..

[21]  B. Andersson,et al.  Antiulcer agents. 5. Inhibition of gastric H+/K(+)-ATPase by substituted imidazo[1,2-a]pyridines and related analogues and its implication in modeling the high affinity potassium ion binding site of the gastric proton pump enzyme. , 1991, Journal of medicinal chemistry.

[22]  Luhua Lai,et al.  Optimization of 5-hydroxytryptamines as dual function inhibitors targeting phospholipase A2 and leukotriene A4 hydrolase. , 2013, European journal of medicinal chemistry.

[23]  Robert P. Sheridan Alternative Global Goodness Metrics and Sensitivity Analysis: Heuristics to Check the Robustness of Conclusions from Studies Comparing Virtual Screening Methods , 2008, J. Chem. Inf. Model..

[24]  Luc Patiny,et al.  Wikipedia Chemical Structure Explorer: substructure and similarity searching of molecules from Wikipedia , 2015, Journal of Cheminformatics.

[25]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[26]  Peter Gedeck,et al.  QSAR - How Good Is It in Practice? Comparison of Descriptor Sets on an Unbiased Cross Section of Corporate Data Sets , 2006, J. Chem. Inf. Model..

[27]  G. L. Grunewald,et al.  Binding requirements of phenolic phenylethylamines in the benzonorbornene skeleton at the active site of phenylethanolamine N-methyltransferase. , 1986, Journal of medicinal chemistry.

[28]  Pierre Baldi,et al.  Large scale study of multiple-molecule queries , 2009, J. Cheminformatics.

[29]  Woody Sherman,et al.  Large-Scale Systematic Analysis of 2D Fingerprint Methods and Parameters to Improve Virtual Screening Enrichments , 2010, J. Chem. Inf. Model..

[30]  Kathrin Heikamp,et al.  Large-Scale Similarity Search Profiling of ChEMBL Compound Data Sets , 2011, J. Chem. Inf. Model..

[31]  G. L. Grunewald,et al.  Conformationally restricted and conformationally defined tyramine analogues as inhibitors of phenylethanolamine N-methyltransferase. , 1989, Journal of medicinal chemistry.

[32]  Sebastian G. Rohrer,et al.  Maximum Unbiased Validation (MUV) Data Sets for Virtual Screening Based on PubChem Bioactivity Data , 2009, J. Chem. Inf. Model..

[33]  Andreas Bender,et al.  How Similar Are Similarity Searching Methods? A Principal Component Analysis of Molecular Descriptor Space , 2009, J. Chem. Inf. Model..

[34]  M. Feth,et al.  Preparation of tetrahydroimidazo[2,1-a]isoquinolines and their use as inhibitors of gastric acid secretion. , 2007, Bioorganic & medicinal chemistry.

[35]  Ramaswamy Nilakantan,et al.  Topological torsion: a new molecular descriptor for SAR applications. Comparison with other descriptors , 1987, J. Chem. Inf. Comput. Sci..

[36]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[37]  Woody Sherman,et al.  Analysis and comparison of 2D fingerprints: insights into database screening performance using eight fingerprint methods , 2010, J. Cheminformatics.

[38]  V. Setola,et al.  Multi-receptor drug design: Haloperidol as a scaffold for the design and synthesis of atypical antipsychotic agents. , 2012, Bioorganic & medicinal chemistry.

[39]  A. Robertson,et al.  Synthesis and SAR study of 4-arylpiperidines and 4-aryl-1,2,3,6-tetrahydropyridines as 5-HT₂C agonists. , 2012, Bioorganic & medicinal chemistry letters.

[40]  Thierry Kogej,et al.  Comparison of Molecular Fingerprint Methods on the Basis of Biological Profile Data , 2009, J. Chem. Inf. Model..

[41]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[42]  C. Seyfried,et al.  5-HT reuptake inhibitors with 5-HT(1B/1D) antagonistic activity: a new approach toward efficient antidepressants. , 2000, Journal of medicinal chemistry.

[43]  Adrià Cereto-Massagué,et al.  Molecular fingerprint similarity search in virtual screening. , 2015, Methods.

[44]  U. Lessel,et al.  In vitro and in silico affinity fingerprints: Finding similarities beyond structural classes , 2000 .

[45]  Peter Willett,et al.  The Calculation of Molecular Structural Similarity: Principles and Practice , 2014, Molecular informatics.

[46]  Sereina Riniker,et al.  Open-source platform to benchmark fingerprints for ligand-based virtual screening , 2013, Journal of Cheminformatics.

[47]  G. Maggiora,et al.  Molecular similarity in medicinal chemistry. , 2014, Journal of medicinal chemistry.

[48]  Robert P. Sheridan,et al.  Comparison of Topological, Shape, and Docking Methods in Virtual Screening , 2007, J. Chem. Inf. Model..

[49]  J. Hieble,et al.  α‐ADRENERGIC AGENTS. 2. SYNTHESIS AND α1‐AGONIST ACTIVITY OF 2‐AMINOTETRALINS , 1982 .

[50]  J. Irwin,et al.  Benchmarking sets for molecular docking. , 2006, Journal of medicinal chemistry.

[51]  L. Hedstrom,et al.  Triazole inhibitors of Cryptosporidium parvum inosine 5'-monophosphate dehydrogenase. , 2009, Journal of medicinal chemistry.

[52]  B. Costall,et al.  Synthesis and dopaminergic properties of some exo- and endo-2-aminobenzonorbornenes designed as rigid analogue of dopamine. , 1982, Journal of medicinal chemistry.