MetMaxStruct: A Tversky-Similarity-Based Strategy for Analysing the (Sub)Structural Similarities of Drugs and Endogenous Metabolites

Background: Previous studies compared the molecular similarity of marketed drugs and endogenous human metabolites (endogenites), using a series of fingerprint-type encodings, variously ranked and clustered using the Tanimoto (Jaccard) similarity coefficient (TS). Because this gives equal weight to all parts of the encoding (thence to different substructures in the molecule) it may not be optimal, since in many cases not all parts of the molecule will bind to their macromolecular targets. Unsupervised methods cannot alone uncover this. We here explore the kinds of differences that may be observed when the TS is replaced—in a manner more equivalent to semi-supervised learning—by variants of the asymmetric Tversky (TV) similarity, that includes α and β parameters. Results: Dramatic differences are observed in (i) the drug-endogenite similarity heatmaps, (ii) the cumulative “greatest similarity” curves, and (iii) the fraction of drugs with a Tversky similarity to a metabolite exceeding a given value when the Tversky α and β parameters are varied from their Tanimoto values. The same is true when the sum of the α and β parameters is varied. A clear trend toward increased endogenite-likeness of marketed drugs is observed when α or β adopt values nearer the extremes of their range, and when their sum is smaller. The kinds of molecules exhibiting the greatest similarity to two interrogating drug molecules (chlorpromazine and clozapine) also vary in both nature and the values of their similarity as α and β are varied. The same is true for the converse, when drugs are interrogated with an endogenite. The fraction of drugs with a Tversky similarity to a molecule in a library exceeding a given value depends on the contents of that library, and α and β may be “tuned” accordingly, in a semi-supervised manner. At some values of α and β drug discovery library candidates or natural products can “look” much more like (i.e., have a numerical similarity much closer to) drugs than do even endogenites. Conclusions: Overall, the Tversky similarity metrics provide a more useful range of examples of molecular similarity than does the simpler Tanimoto similarity, and help to draw attention to molecular similarities that would not be recognized if Tanimoto alone were used. Hence, the Tversky similarity metrics are likely to be of significant value in many general problems in cheminformatics.

[1]  B. Fan,et al.  Molecular similarity and diversity in chemoinformatics: From theory to applications , 2006, Molecular Diversity.

[2]  David Flaxbart Handbook of Chemoinformatics: From Data to Knowledge, Volumes 1−4 Edited by Johann Gasteiger (University of Erlangen-Nürnberg). Wiley-VCH Verlag GmbH & Co. KGaA: Weinheim. 2003. xlvii + 1870 pp. $750.00. ISBN 3-527-30680-3. , 2004 .

[3]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[4]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[5]  G. Superti-Furga,et al.  A Call for Systematic Research on Solute Carriers , 2015, Cell.

[6]  D. Kell,et al.  Membrane transporter engineering in industrial biotechnology and whole cell biocatalysis. , 2015, Trends in biotechnology.

[7]  Markus J. Herrgård,et al.  A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology , 2008, Nature Biotechnology.

[8]  Jürgen Bajorath,et al.  Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches. , 2007, Drug discovery today.

[9]  Frederico Gualberto F. Coelho,et al.  Semi-supervised feature selection , 2013 .

[10]  Pierre Baldi,et al.  Bounds and Algorithms for Fast Exact Searches of Chemical Fingerprints in Linear and Sublinear Time , 2007, J. Chem. Inf. Model..

[11]  Christian Senger,et al.  StreptomeDB: a resource for natural compounds isolated from Streptomyces species , 2012, Nucleic Acids Res..

[12]  Rajarshi Guha,et al.  KNIME Workflow to Assess PAINS Filters in SMARTS Format. Comparison of RDKit and Indigo Cheminformatics Libraries , 2011, Molecular informatics.

[13]  S. R. Harris,et al.  STUDIES OF FLAVIN ADENINE DINUCLEOTIDE-REQUIRING ENZYMES AND PHENOTHIAZINES-I. INTERACTIONS OF CHLORPROMAZINE AND D-AMINO ACID OXIDASE. , 1965, Biochemical pharmacology.

[14]  Sanjay K. Nigam,et al.  What do drug transporters really do? , 2014, Nature Reviews Drug Discovery.

[15]  Petra Schneider,et al.  Distance phenomena in high‐dimensional chemical descriptor spaces: Consequences for similarity‐based approaches , 2009, J. Comput. Chem..

[16]  W Patrick Walters,et al.  Going further than Lipinski's rule in drug design , 2012, Expert opinion on drug discovery.

[17]  R. Rivlin,et al.  Inhibition of riboflavin metabolism in rat tissues by chlorpromazine, imipramine, and amitriptyline. , 1981, The Journal of clinical investigation.

[18]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[19]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[20]  R. Glen,et al.  Molecular similarity: a key technique in molecular informatics. , 2004, Organic & biomolecular chemistry.

[21]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[22]  Douglas B. Kell,et al.  Understanding the foundations of the structural similarities between marketed drugs and endogenous human metabolites , 2015, Front. Pharmacol..

[23]  Jürgen Bajorath,et al.  Apparent Asymmetry in Fingerprint Similarity Searching is a Direct Consequence of Differences in Bit Densities and Molecular Size , 2007, ChemMedChem.

[24]  Horst Bunke,et al.  On a relation between graph edit distance and maximum common subgraph , 1997, Pattern Recognit. Lett..

[25]  G. V. Paolini,et al.  Global mapping of pharmacological space , 2006, Nature Biotechnology.

[26]  Stefan Senger,et al.  Using Tversky Similarity Searches for Core Hopping: Finding the Needles in the Haystack , 2009, J. Chem. Inf. Model..

[27]  Veerabahu Shanmugasundaram,et al.  Molecular similarity measures. , 2011, Methods in molecular biology.

[28]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[29]  Inaki Morao,et al.  Drug discovery applications for KNIME: an open source data mining platform. , 2012, Current topics in medicinal chemistry.

[30]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[31]  H. Koepsell The SLC22 family with transporters of organic cations, anions and zwitterions. , 2013, Molecular aspects of medicine.

[32]  Doris Chen,et al.  The solute carrier SLC35F2 enables YM155-mediated DNA damage toxicity. , 2014, Nature chemical biology.

[33]  Douglas B. Kell How drugs pass through biological cell membranes – a paradigm shift in our understanding? , 2016 .

[34]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[35]  J. L. Durant,et al.  Reoptimization of MDL Keys for Use in Drug Discovery. , 2003 .

[36]  D. Kell,et al.  Distributed under Creative Commons Cc-by 4.0 the Apparent Permeabilities of Caco-2 Cells to Marketed Drugs: Magnitude, and Independence from Both Biophysical Properties and Endogenite Similarities , 2022 .

[37]  T. Ishikawa,et al.  Pharmacogenomics of Human Drug Transporters: Clinical Impacts , 2013 .

[38]  A. Tropsha,et al.  Human Intestinal Transporter Database: QSAR Modeling and Virtual Profiling of Drug Uptake, Efflux and Interactions , 2013, Pharmaceutical Research.

[39]  G. V. Paolini,et al.  Quantifying the chemical beauty of drugs. , 2012, Nature chemistry.

[40]  D. Kell,et al.  Carrier-mediated cellular uptake of pharmaceutical drugs: an exception or the rule? , 2008, Nature Reviews Drug Discovery.

[41]  P. Sneath,et al.  Numerical Taxonomy , 1962, Nature.

[42]  G. Maggiora,et al.  Molecular similarity in medicinal chemistry. , 2014, Journal of medicinal chemistry.

[43]  Weifan Zheng,et al.  Novel Approach to Structure-Based Pharmacophore Search Using Computational Geometry and Shape Matching Techniques , 2008, J. Chem. Inf. Model..

[44]  Uko Maran,et al.  QSAR DataBank - an approach for the digital organization and archiving of QSAR model information , 2014, Journal of Cheminformatics.

[45]  Thorsten Meinl,et al.  KNIME-CDK: Workflow-driven cheminformatics , 2013, BMC Bioinformatics.

[46]  Christoph Steinbeck,et al.  Natural product-likeness score revisited: an open-source, open-data implementation , 2012, BMC Bioinformatics.

[47]  S. Golz,et al.  Discovery of the ergothioneine transporter. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[48]  C. Peota Novel approach. , 2011, Minnesota medicine.

[49]  José L. Medina-Franco,et al.  MOLECULAR SIMILARITY ANALYSIS , 2013 .

[50]  Sereina Riniker,et al.  Heterogeneous Classifier Fusion for Ligand-Based Virtual Screening: Or, How Decision Making by Committee Can Be a Good Thing , 2013, J. Chem. Inf. Model..

[51]  Pierre Baldi,et al.  Large scale study of multiple-molecule queries , 2009, J. Cheminformatics.

[52]  Pierre Baldi,et al.  When is Chemical Similarity Significant? The Statistical Distribution of Chemical Similarity Scores and Its Extreme Values , 2010, J. Chem. Inf. Model..

[53]  Neil Swainston,et al.  An analysis of a ‘community-driven’ reconstruction of the human metabolic network , 2013, Metabolomics.

[54]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[55]  Nina Jeliazkova,et al.  Toxmatch--a chemical classification and activity prediction tool based on similarity measures. , 2008, Regulatory toxicology and pharmacology : RTP.

[56]  Geoffrey E. Hinton Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[57]  Valerie J. Gillet,et al.  Investigation of the Use of Spectral Clustering for the Analysis of Molecular Data , 2014, J. Chem. Inf. Model..

[58]  Naomie Salim,et al.  Combination of Fingerprint-Based Similarity Coefficients Using Data Fusion , 2003, J. Chem. Inf. Comput. Sci..

[59]  Peter Ertl,et al.  Natural Product-likeness Score and Its Application for Prioritization of Compound Libraries , 2008, J. Chem. Inf. Model..

[60]  A. Tversky Features of Similarity , 1977 .

[61]  Douglas B. Kell,et al.  Statistical strategies for avoiding false discoveries in metabolomics and related experiments , 2007, Metabolomics.

[62]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[63]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[64]  Dennis H. Rouvray,et al.  Definition and role of similarity concepts in the chemical and physical sciences , 1992, J. Chem. Inf. Comput. Sci..

[65]  Sunil Gupta,et al.  Comparing the chemical spaces of metabolites and available chemicals: models of metabolite-likeness , 2007, Molecular Diversity.

[66]  Douglas B. Kell,et al.  Computational cluster validation in post-genomic data analysis , 2005, Bioinform..

[67]  Yiqun Cao,et al.  ChemMine tools: an online service for analyzing and clustering small molecules , 2011, Nucleic Acids Res..

[68]  Douglas B. Kell,et al.  Implications of endogenous roles of transporters for drug discovery: hitchhiking and metabolite-likeness , 2016, Nature Reviews Drug Discovery.

[69]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[70]  Kam Y. J. Zhang,et al.  A rotation-translation invariant molecular descriptor of partial charges and its use in ligand-based virtual screening , 2014, Journal of Cheminformatics.

[71]  Ray M. Marín,et al.  Graph Theoretical Similarity Approach to Compare Molecular Electrostatic Potentials. , 2008 .

[72]  Artem Cherkasov,et al.  Comparative QSAR- and Fragments Distribution Analysis of Drugs, Druglikes, Metabolic Substances, and Antimicrobial Compounds , 2006, J. Chem. Inf. Model..

[73]  Darren R. Flower,et al.  On the Properties of Bit String-Based Measures of Chemical Similarity , 1998, J. Chem. Inf. Comput. Sci..

[74]  Douglas B Kell,et al.  What would be the observable consequences if phospholipid bilayer diffusion of drugs into cells is negligible? , 2015, Trends in pharmacological sciences.

[75]  Guofeng You,et al.  Drug transporters : molecular characterization and role in drug disposition , 2014 .

[76]  D. Gründemann The ergothioneine transporter controls and indicates ergothioneine activity--a review. , 2012, Preventive medicine.

[77]  Neil Swainston,et al.  A ‘rule of 0.5’ for the metabolite-likeness of approved pharmaceutical drugs , 2014, Metabolomics.

[78]  Anna Vulpetti,et al.  Making sure there's a "give" associated with the "take": producing and using open-source software in big pharma , 2011, J. Cheminformatics.

[79]  Douglas B Kell,et al.  Implications of the dominant role of transporters in drug uptake by cells. , 2009, Current topics in medicinal chemistry.

[80]  Ines Thiele,et al.  Membrane transporters in a human genome-scale metabolic knowledgebase and their implications for disease , 2014, Front. Physiol..

[81]  B. Palsson Systems Biology: Constraint-based Reconstruction and Analysis , 2015 .

[82]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[83]  Douglas B. Kell,et al.  Software review: the KNIME workflow environment and its applications in genetic programming and machine learning , 2015, Genetic Programming and Evolvable Machines.

[84]  Ronan M. T. Fleming,et al.  A community-driven global reconstruction of human metabolism , 2013, Nature Biotechnology.

[85]  Gisbert Schneider,et al.  Distance phenomena in high-dimensional chemical descriptor spaces: consequences for similarity-based approaches , 2009 .

[86]  T. Fukushima,et al.  Inhibition of D-amino acid oxidase activity by antipsychotic drugs evaluated by a fluorometric assay using D-kynurenine as substrate. , 2011, Yakugaku zasshi : Journal of the Pharmaceutical Society of Japan.

[87]  R. Rivlin,et al.  Accelerated development of riboflavin deficiency by treatment with chlorpromazine. , 1983, Biochemical pharmacology.

[88]  M. Hediger,et al.  The ABCs of membrane transporters in health and disease (SLC series): Introduction , 2013, Molecular aspects of medicine.

[89]  D. Kell,et al.  The cellular uptake of pharmaceutical drugs is mainly carrier-mediated and is thus an issue not so much of biophysics but of systems biology , 2009 .

[90]  D. Kell,et al.  Finding novel pharmaceuticals in the systems biology era using multiple effective drug targets, phenotypic screening and knowledge of transporters: where drug discovery went wrong and how to fix it , 2013, The FEBS journal.

[91]  J. Dearden,et al.  QSAR modeling: where have you been? Where are you going to? , 2014, Journal of medicinal chemistry.

[92]  Peter Willett,et al.  The Calculation of Molecular Structural Similarity: Principles and Practice , 2014, Molecular informatics.

[93]  P Willett,et al.  Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings. , 2002, Combinatorial chemistry & high throughput screening.

[94]  Sereina Riniker,et al.  Open-source platform to benchmark fingerprints for ligand-based virtual screening , 2013, Journal of Cheminformatics.

[95]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[96]  L. Pollegioni,et al.  Effect of ligand binding on human D‐amino acid oxidase: Implications for the development of new drugs for schizophrenia treatment , 2010, Protein science : a publication of the Protein Society.

[97]  Jürgen Bajorath,et al.  Balancing the Influence of Molecular Complexity on Fingerprint Similarity Searching. , 2008 .

[98]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[99]  D. Kell,et al.  'Metabolite-likeness' as a criterion in the design and selection of pharmaceutical drug libraries. , 2009, Drug discovery today.

[100]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[101]  Joshua D. Knowles,et al.  Semi-supervised feature selection via multiobjective optimization , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[102]  Peter Willett,et al.  Similarity-based virtual screening using 2D fingerprints. , 2006, Drug discovery today.

[103]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[104]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[105]  Douglas B Kell,et al.  Genome-wide assessment of the carriers involved in the cellular uptake of drugs: a model system in yeast , 2011, BMC Biology.

[106]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[107]  Yuan Wang,et al.  Using Information from Historical High-Throughput Screens to Predict Active Compounds , 2014, J. Chem. Inf. Model..

[108]  Douglas B Kell,et al.  Pharmaceutical drug transport: the issues and the implications that it is essentially carrier-mediated only. , 2011, Drug discovery today.

[109]  Lemont B. Kier,et al.  Molecular Similarity Based on Novel Atom-Type Electrotopological State Indices , 1995, J. Chem. Inf. Comput. Sci..

[110]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[111]  Ayhan Demiriz,et al.  Semi-Supervised Clustering Using Genetic Algorithms , 1999 .

[112]  B. Heraud,et al.  The Analysis of the Community , 1970 .

[113]  Peter S. Kutchukian,et al.  Rethinking molecular similarity: comparing compounds on the basis of biological activity. , 2012, ACS chemical biology.

[114]  Andreas Bender,et al.  Understanding and Classifying Metabolite Space and Metabolite-Likeness , 2011, PloS one.

[115]  Drug Transporters , 2012 .

[116]  Artem Cherkasov,et al.  Can 'Bacterial-Metabolite-Likeness' Model Improve Odds of 'in Silico' Antibiotic Discovery? , 2006, J. Chem. Inf. Model..

[117]  S-M Huang,et al.  Transporters in Drug Development and Clinical Pharmacology , 2013, Clinical pharmacology and therapeutics.

[118]  Ryan G. Coleman,et al.  ZINC: A Free Tool to Discover Chemistry for Biology , 2012, J. Chem. Inf. Model..

[119]  Robert C. Glen,et al.  Chapter 9 Molecular Similarity: Advances in Methods, Applications and Validations in Virtual Screening and QSAR , 2006, Annual Reports in Computational Chemistry.

[120]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[121]  Gregory A Landrum,et al.  Is that a scientific publication or an advertisement? Reproducibility, source code and data in the computational chemistry literature. , 2012, Future medicinal chemistry.

[122]  M. Niemi,et al.  Membrane transporters in drug development , 2010, Nature Reviews Drug Discovery.

[123]  D. Kell,et al.  The promiscuous binding of pharmaceutical drugs and their transporter-mediated uptake into cells: what we (need to) know and how we can do so. , 2013, Drug discovery today.

[124]  Ajay N. Jain,et al.  Molecular Shape and Medicinal Chemistry: A Perspective , 2010, Journal of medicinal chemistry.

[125]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[126]  Jürgen Bajorath,et al.  Balancing the Influence of Molecular Complexity on Fingerprint Similarity Searching , 2008, J. Chem. Inf. Model..

[127]  D. Kell,et al.  Metabolomics and systems pharmacology: why and how to model the human metabolic network for drug discovery☆ , 2014, Drug discovery today.

[128]  David S. Wishart,et al.  DrugBank 4.0: shedding new light on drug metabolism , 2013, Nucleic Acids Res..

[129]  Douglas B. Kell,et al.  Fitting Transporter Activities to Cellular Drug Concentrations and Fluxes: Why the Bumblebee Can Fly , 2015, Trends in pharmacological sciences.

[130]  Peter Brandt,et al.  Identification of a novel scaffold for allosteric inhibition of wild type and drug resistant HIV-1 reverse transcriptase by fragment library screening. , 2011, Journal of medicinal chemistry.

[131]  Pierre Baldi,et al.  ChemDB: a public database of small molecules and related chemoinformatics resources , 2005, Bioinform..

[132]  S. L. Dixon,et al.  The hidden component of size in two-dimensional fragment descriptors: side effects on sampling in bioactive libraries. , 1999, Journal of medicinal chemistry.

[133]  Katsuhisa Inoue,et al.  Transport Functions of Riboflavin Carriers in the Rat Small Intestine and Colon: Site Difference and Effects of Tricyclic-Type Drugs , 2001, Drug delivery.

[134]  Douglas B. Kell,et al.  How drugs get into cells: tested and testable predictions to help discriminate between transporter-mediated uptake and lipoidal bilayer diffusion , 2014, Front. Pharmacol..

[135]  Douglas B. Kell,et al.  The transporter-mediated cellular uptake of pharmaceutical drugs is based on their metabolite-likeness and not on their bulk biophysical properties: Towards a systems pharmacology ☆ , 2015 .

[136]  John J Irwin,et al.  Using ZINC to Acquire a Virtual Screening Library , 2008, Current protocols in bioinformatics.

[137]  Maria-Florina Balcan,et al.  A discriminative model for semi-supervised learning , 2010, J. ACM.