Understanding the foundations of the structural similarities between marketed drugs and endogenous human metabolites

Background: A recent comparison showed the extensive similarities between the structural properties of metabolites in the reconstructed human metabolic network (“endogenites”) and those of successful, marketed drugs (“drugs”). Results: Clustering indicated the related but differential population of chemical space by endogenites and drugs. Differences between the drug-endogenite similarities resulting from various encodings and judged by Tanimoto similarity could be related simply to the fraction of the bitstrings set to 1. By extracting drug/endogenite substructures, we develop a novel family of fingerprints, the Drug Endogenite Substructure (DES) encodings, based on the ranked frequency of the various substructures. These provide a natural assessment of drug-endogenite likeness, and may be used as descriptors with which to derive quantitative structure-activity relationships (QSARs). Conclusions: “Drug-endogenite likeness” seems to have utility, and leads to a simple, novel and interpretable substructure-based molecular encoding for cheminformatics.

[1]  Uko Maran,et al.  QSAR DataBank - an approach for the digital organization and archiving of QSAR model information , 2014, Journal of Cheminformatics.

[2]  Ferran Sanz,et al.  Anchor-GRIND: filling the gap between standard 3D QSAR and the GRid-INdependent descriptors. , 2005, Journal of medicinal chemistry.

[3]  Miklos Feher,et al.  Property Distributions: Differences between Drugs, Natural Products, and Molecules from Combinatorial Chemistry , 2003, J. Chem. Inf. Comput. Sci..

[4]  Ines Thiele,et al.  Membrane transporters in a human genome-scale metabolic knowledgebase and their implications for disease , 2014, Front. Physiol..

[5]  Wendy A Warr,et al.  Some Trends in Chem(o)informatics. , 2011, Methods in molecular biology.

[6]  Austin B. Yongye,et al.  Multitarget Structure-Activity Relationships Characterized by Activity-Difference Maps and Consensus Similarity Measure , 2011, J. Chem. Inf. Model..

[7]  Douglas B. Kell,et al.  The virtue of innovation: innovation through the lenses of biological evolution , 2015, Journal of The Royal Society Interface.

[8]  Christopher W Murray,et al.  Efficient exploration of chemical space by fragment-based screening. , 2014, Progress in biophysics and molecular biology.

[9]  Sereina Riniker,et al.  Heterogeneous Classifier Fusion for Ligand-Based Virtual Screening: Or, How Decision Making by Committee Can Be a Good Thing , 2013, J. Chem. Inf. Model..

[10]  Varun Khanna,et al.  Physiochemical property space distribution among human metabolites, drugs and toxins , 2009, BMC Bioinformatics.

[11]  Pragya Agarwal,et al.  Self-Organising Maps , 2008 .

[12]  Alexander Tropsha,et al.  Best Practices for QSAR Model Development, Validation, and Exploitation , 2010, Molecular informatics.

[13]  Dan C. Fara,et al.  Lead-like, drug-like or “Pub-like”: how different are they? , 2007, J. Comput. Aided Mol. Des..

[14]  M. Niemi,et al.  Membrane transporters in drug development , 2010, Nature Reviews Drug Discovery.

[15]  Douglas B. Kell,et al.  How drugs get into cells: tested and testable predictions to help discriminate between transporter-mediated uptake and lipoidal bilayer diffusion , 2014, Front. Pharmacol..

[16]  Naomie Salim,et al.  Analysis and Display of the Size Dependence of Chemical Similarity Coefficients , 2003, J. Chem. Inf. Comput. Sci..

[17]  Alessandra Conversi,et al.  Comparative Analysis , 2009, Encyclopedia of Database Systems.

[18]  Erkki Oja,et al.  Kohonen Maps , 1999, Encyclopedia of Machine Learning.

[19]  M. Segall,et al.  Alternative variables in drug discovery: promises and challenges. , 2014, Future medicinal chemistry.

[20]  R. Todeschini,et al.  Molecular Descriptors for Chemoinformatics: Volume I: Alphabetical Listing / Volume II: Appendices, References , 2009 .

[21]  D. Kell,et al.  Metabolomics and systems pharmacology: why and how to model the human metabolic network for drug discovery☆ , 2014, Drug discovery today.

[22]  Daniela Barlocco,et al.  Privileged structures as leads in medicinal chemistry. , 2006, Current medicinal chemistry.

[23]  T. Dick,et al.  Reactive dirty fragments: implications for tuberculosis drug discovery. , 2014, Current opinion in microbiology.

[24]  Douglas B. Kell,et al.  Statistical strategies for avoiding false discoveries in metabolomics and related experiments , 2007, Metabolomics.

[25]  Peter Willett,et al.  Fusing similarity rankings in ligand-based virtual screening , 2013, Computational and structural biotechnology journal.

[26]  D. Manallack,et al.  The Acid/Base Profile of the Human Metabolome and Natural Products , 2013, Molecular informatics.

[27]  Jérôme Hert,et al.  Turbo similarity searching: Effect of fingerprint and dataset on virtual‐screening performance , 2009, Stat. Anal. Data Min..

[28]  Teuvo Kohonen,et al.  In: Self-organising Maps , 1995 .

[29]  G. Maggiora,et al.  Molecular similarity in medicinal chemistry. , 2014, Journal of medicinal chemistry.

[30]  Charles H. Reynolds,et al.  Defining Privileged Reagents Using Subsimilarity Comparison , 2004, J. Chem. Inf. Model..

[31]  D. Kell,et al.  Carrier-mediated cellular uptake of pharmaceutical drugs: an exception or the rule? , 2008, Nature Reviews Drug Discovery.

[32]  Jean-Louis Reymond,et al.  Visualization and Virtual Screening of the Chemical Universe Database GDB-17 , 2013, J. Chem. Inf. Model..

[33]  Ola Engkvist,et al.  A comparative analysis of the molecular topologies for drugs, clinical candidates, natural products, human metabolites and general bioactive compounds , 2012 .

[34]  T. Buzan How to Mind Map , 2002 .

[35]  Artem Cherkasov,et al.  Comparative QSAR- and Fragments Distribution Analysis of Drugs, Druglikes, Metabolic Substances, and Antimicrobial Compounds , 2006, J. Chem. Inf. Model..

[36]  Peter Willett,et al.  Combination of Similarity Rankings Using Data Fusion , 2013, J. Chem. Inf. Model..

[37]  D. Kell,et al.  Membrane transporter engineering in industrial biotechnology and whole cell biocatalysis. , 2015, Trends in biotechnology.

[38]  Jérôme Hert,et al.  Comparison of Fingerprint-Based Methods for Virtual Screening Using Multiple Bioactive Reference Structures , 2004, J. Chem. Inf. Model..

[39]  F. Sanz,et al.  Anchor-GRIND: filling the gap between standard 3D QSAR and the GRid-INdependent descriptors. , 2005 .

[40]  Jean-Louis Reymond,et al.  Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17 , 2012, J. Chem. Inf. Model..

[41]  Jürgen Bajorath,et al.  Apparent Asymmetry in Fingerprint Similarity Searching is a Direct Consequence of Differences in Bit Densities and Molecular Size , 2007, ChemMedChem.

[42]  D. Kell,et al.  Array-based evolution of DNA aptamers allows modelling of an explicit sequence-fitness landscape , 2008, Nucleic acids research.

[43]  Andreas Bender,et al.  Understanding and Classifying Metabolite Space and Metabolite-Likeness , 2011, PloS one.

[44]  Maik Moeller,et al.  An Introduction To Chemoinformatics , 2016 .

[45]  John P. Overington,et al.  Ligand efficiency indices for an effective mapping of chemico-biological space: the concept of an atlas-like representation. , 2010, Drug discovery today.

[46]  J. T. Njardarson,et al.  Data-mining for sulfur and fluorine: an evaluation of pharmaceuticals to reveal opportunities for drug design and discovery. , 2014, Journal of medicinal chemistry.

[47]  D. Kell,et al.  'Metabolite-likeness' as a criterion in the design and selection of pharmaceutical drug libraries. , 2009, Drug discovery today.

[48]  P Willett,et al.  Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings. , 2002, Combinatorial chemistry & high throughput screening.

[49]  Andreas Bender,et al.  How Diverse Are Diversity Assessment Methods? A Comparative Analysis and Benchmarking of Molecular Descriptor Space , 2014, J. Chem. Inf. Model..

[50]  Ronan M. T. Fleming,et al.  A community-driven global reconstruction of human metabolism , 2013, Nature Biotechnology.

[51]  Maciej Haranczyk,et al.  Comparison of Nonbinary Similarity Coefficients for Similarity Searching, Clustering and Compound Selection , 2009, J. Chem. Inf. Model..

[52]  Stefan Wetzel,et al.  Natural-product-derived fragments for fragment-based ligand discovery , 2012, Nature Chemistry.

[53]  Sunil Gupta,et al.  Comparing the chemical spaces of metabolites and available chemicals: models of metabolite-likeness , 2007, Molecular Diversity.

[54]  Jürgen Bajorath,et al.  Combinatorial Preferences Affect Molecular Similarity/Diversity Calculations Using Binary Fingerprints and Tanimoto Coefficients , 2000, J. Chem. Inf. Comput. Sci..

[55]  Jinbo Bi,et al.  Regression Error Characteristic Curves , 2003, ICML.

[56]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[57]  Valerie J. Gillet,et al.  Analysis of Data Fusion Methods in Virtual Screening: Theoretical Model , 2006, J. Chem. Inf. Model..

[58]  Ryan G. Coleman,et al.  ZINC: A Free Tool to Discover Chemistry for Biology , 2012, J. Chem. Inf. Model..

[59]  Anna Vulpetti,et al.  Making sure there's a "give" associated with the "take": producing and using open-source software in big pharma , 2011, J. Cheminformatics.

[60]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[61]  D. Kell,et al.  Finding novel pharmaceuticals in the systems biology era using multiple effective drug targets, phenotypic screening and knowledge of transporters: where drug discovery went wrong and how to fix it , 2013, The FEBS journal.

[62]  G. Bemis,et al.  Properties of known drugs. 2. Side chains. , 1999, Journal of medicinal chemistry.

[63]  George Papadatos,et al.  The ChEMBL bioactivity database: an update , 2013, Nucleic Acids Res..

[64]  Iskander Yusof,et al.  Considering the impact drug-like properties have on the chance of success. , 2013, Drug discovery today.

[65]  U. Maran,et al.  Molecular property filters describing pharmacokinetics and drug binding. , 2012, Current medicinal chemistry.

[66]  C. Lipinski Lead- and drug-like compounds: the rule-of-five revolution. , 2004, Drug discovery today. Technologies.

[67]  Sanguthevar Rajasekaran,et al.  BioSM: Metabolomics Tool for Identifying Endogenous Mammalian Biochemical Structures in Chemical Structure Space , 2013, J. Chem. Inf. Model..

[68]  Jürgen Bajorath,et al.  Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches. , 2007, Drug discovery today.

[69]  Thorsten Meinl,et al.  Workflow Tools for Managing Biological and Chemical Data , 2012 .

[70]  Mark Watson,et al.  Optimizing the use of open-source software applications in drug discovery. , 2006, Drug discovery today.

[71]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[72]  Neil Swainston,et al.  A ‘rule of 0.5’ for the metabolite-likeness of approved pharmaceutical drugs , 2014, Metabolomics.

[73]  P. Willett,et al.  Combination of molecular similarity measures using data fusion , 2000 .

[74]  Peter Willett,et al.  Combination Rules for Group Fusion in Similarity‐Based Virtual Screening , 2010, Molecular informatics.

[75]  W Patrick Walters,et al.  Going further than Lipinski's rule in drug design , 2012, Expert opinion on drug discovery.

[76]  Inaki Morao,et al.  Drug discovery applications for KNIME: an open source data mining platform. , 2012, Current topics in medicinal chemistry.

[77]  Rohan A Davis,et al.  Drug-like properties: guiding principles for the design of natural product libraries. , 2012, Journal of natural products.

[78]  Douglas B Kell,et al.  Pharmaceutical drug transport: the issues and the implications that it is essentially carrier-mediated only. , 2011, Drug discovery today.

[79]  D. Selwood,et al.  Two- and Three-dimensional Rings in Drugs , 2014, Chemical biology & drug design.

[80]  Varun Khanna,et al.  Structural diversity of biologically interesting datasets: a scaffold analysis approach , 2011, J. Cheminformatics.

[81]  James E. Haber The rule of three , 2016, Nature Reviews Molecular Cell Biology.

[82]  G. Bemis,et al.  The properties of known drugs. 1. Molecular frameworks. , 1996, Journal of medicinal chemistry.

[83]  Andreas Bender,et al.  Handbook of Chemoinformatics Algorithms , 2010 .

[84]  Richard D. Taylor,et al.  Rings in drugs. , 2014, Journal of medicinal chemistry.

[85]  Andrew R. Leach,et al.  An Introduction to Chemoinformatics , 2003 .

[86]  Chris Orvig,et al.  Metallodrugs in medicinal inorganic chemistry. , 2014, Chemical reviews.

[87]  Jérôme Hert,et al.  Turbo similarity searching: Effect of fingerprint and dataset on virtual-screening performance , 2009 .

[88]  B. Heraud,et al.  The Analysis of the Community , 1970 .

[89]  Darren R. Flower,et al.  On the Properties of Bit String-Based Measures of Chemical Similarity , 1998, J. Chem. Inf. Comput. Sci..

[90]  M. Congreve,et al.  A 'rule of three' for fragment-based lead discovery? , 2003, Drug discovery today.

[91]  Jintao Zhang,et al.  Characterizing the Diversity and Biological Relevance of the MLPCN Assay Manifold and Screening Set , 2011, J. Chem. Inf. Model..

[92]  I. Muegge Selection criteria for drug‐like compounds , 2003, Medicinal research reviews.

[93]  Lefteris Angelis,et al.  Visual comparison of software cost estimation models by regression error characteristic analysis , 2010, J. Syst. Softw..

[94]  D. Kell,et al.  The promiscuous binding of pharmaceutical drugs and their transporter-mediated uptake into cells: what we (need to) know and how we can do so. , 2013, Drug discovery today.

[95]  Douglas B Kell,et al.  What would be the observable consequences if phospholipid bilayer diffusion of drugs into cells is negligible? , 2015, Trends in pharmacological sciences.

[96]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[97]  Douglas B Kell,et al.  Implications of the dominant role of transporters in drug uptake by cells. , 2009, Current topics in medicinal chemistry.

[98]  José L. Medina-Franco,et al.  MOLECULAR SIMILARITY ANALYSIS , 2013 .

[99]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[100]  Brian Pease,et al.  Discovery of Leukotriene A4 Hydrolase Inhibitors Using Metabolomics Biased Fragment Crystallography† , 2009, Journal of medicinal chemistry.

[101]  Thorsten Meinl,et al.  KNIME-CDK: Workflow-driven cheminformatics , 2013, BMC Bioinformatics.

[102]  Scott Boyer,et al.  AZOrange - High performance open source machine learning for QSAR modeling in a graphical programming environment , 2011, J. Cheminformatics.

[103]  Neil Swainston,et al.  An analysis of a ‘community-driven’ reconstruction of the human metabolic network , 2013, Metabolomics.

[104]  J. L. Durant,et al.  Reoptimization of MDL Keys for Use in Drug Discovery. , 2003 .

[105]  J. T. Njardarson,et al.  Analysis of the structural diversity, substitution patterns, and frequency of nitrogen heterocycles among U.S. FDA approved pharmaceuticals. , 2014, Journal of medicinal chemistry.

[106]  Dora M Schnur,et al.  Are target-family-privileged substructures truly privileged? , 2006, Journal of medicinal chemistry.

[107]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[108]  Donald E. Knuth,et al.  Efficient balanced codes , 1986, IEEE Trans. Inf. Theory.

[109]  Douglas B. Kell,et al.  Genetic algorithms as a method for variable selection in multiple linear regression and partial least squares regression, with applications to pyrolysis mass spectrometry , 1997 .