Predicting the protein targets for athletic performance-enhancing substances

BackgroundThe World Anti-Doping Agency (WADA) publishes the Prohibited List, a manually compiled international standard of substances and methods prohibited in-competition, out-of-competition and in particular sports. It would be ideal to be able to identify all substances that have one or more performance-enhancing pharmacological actions in an automated, fast and cost effective way. Here, we use experimental data derived from the ChEMBL database (~7,000,000 activity records for 1,300,000 compounds) to build a database model that takes into account both structure and experimental information, and use this database to predict both on-target and off-target interactions between these molecules and targets relevant to doping in sport.ResultsThe ChEMBL database was screened and eight well populated categories of activities (Ki, Kd, EC50, ED50, activity, potency, inhibition and IC50) were used for a rule-based filtering process to define the labels “active” or “inactive”. The “active” compounds for each of the ChEMBL families were thereby defined and these populated our bioactivity-based filtered families. A structure-based clustering step was subsequently performed in order to split families with more than one distinct chemical scaffold. This produced refined families, whose members share both a common chemical scaffold and bioactivity against a common target in ChEMBL.ConclusionsWe have used the Parzen-Rosenblatt machine learning approach to test whether compounds in ChEMBL can be correctly predicted to belong to their appropriate refined families. Validation tests using the refined families gave a significant increase in predictivity compared with the filtered or with the original families. Out of 61,660 queries in our Monte Carlo cross-validation, belonging to 19,639 refined families, 41,300 (66.98%) had the parent family as the top prediction and 53,797 (87.25%) had the parent family in the top four hits. Having thus validated our approach, we used it to identify the protein targets associated with the WADA prohibited classes. For compounds where we do not have experimental data, we use their computed patterns of interaction with protein targets to make predictions of bioactivity. We hope that other groups will test these predictions experimentally in the future.

[1]  Junwei Zhang,et al.  Development of KiBank, a database supporting structure-based drug design , 2004, Comput. Biol. Chem..

[2]  C. Benz,et al.  Toremifene: pharmacologic and pharmacokinetic basis of reversing multidrug resistance. , 1989, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[3]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[4]  D J Rogers,et al.  A Computer Program for Classifying Plants. , 1960, Science.

[5]  H Y Lam,et al.  Tamoxifen is a calmodulin antagonist in the activation of cAMP phosphodiesterase. , 1984, Biochemical and biophysical research communications.

[6]  Tudor I. Oprea,et al.  ChemProt-2.0: visual navigation in a disease chemical biology database , 2012, Nucleic Acids Res..

[7]  Florian Nigsch,et al.  A novel hybrid ultrafast shape descriptor method for use in virtual screening , 2008, Chemistry Central journal.

[8]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[9]  杨凌,et al.  Interactions between human cytochrome P450 enzymes and steroids: physiological and pharmacological implications. , 2009 .

[10]  John B. O. Mitchell,et al.  Predicting the mechanism of phospholipidosis , 2012, Journal of Cheminformatics.

[11]  P. Michels,et al.  Inhibition of Trypanosoma brucei glucose-6-phosphate dehydrogenase by human steroids and their effects on the viability of cultured parasites. , 2009, Bioorganic & medicinal chemistry.

[12]  Evan Bolton,et al.  PubChem's BioAssay Database , 2011, Nucleic Acids Res..

[13]  John B. O. Mitchell,et al.  Classifying the World Anti-Doping Agency's 2005 Prohibited List Using the Chemistry Development Kit Fingerprint , 2006, CompLife.

[14]  Satoshi Niijima,et al.  Cross-Target View to Feature Selection: Identification of Molecular Interaction Features in Ligand-Target Space , 2011, J. Chem. Inf. Model..

[15]  A. Bender,et al.  Circular fingerprints: flexible molecular descriptors with applications from physical chemistry to ADME. , 2006, IDrugs : the investigational drugs journal.

[16]  A. Bender,et al.  Analysis of Pharmacology Data and the Prediction of Adverse Drug Reactions and Off‐Target Effects from Chemical Structure , 2007, ChemMedChem.

[17]  C. D. Yoo,et al.  Mycobacterium tuberculosis infection in a corticosteroid-treated rheumatic disease patient population. , 1998, Clinical and experimental rheumatology.

[18]  John B. O. Mitchell,et al.  Toxicological relationships between proteins obtained from protein target predictions of large toxicity databases. , 2008, Toxicology and applied pharmacology.

[19]  R. Gainetdinov,et al.  Plasma membrane monoamine transporters: structure, regulation and function , 2003, Nature Reviews Neuroscience.

[20]  Tudor I. Oprea,et al.  WOMBAT: World of Molecular Bioactivity , 2005 .

[21]  B. Roth,et al.  The Multiplicity of Serotonin Receptors: Uselessly Diverse Molecules or an Embarrassment of Riches? , 2000 .

[22]  T. Tephly,et al.  Inhibition and active sites of UDP-glucuronosyltransferases 2B7 and 1A1. , 2002, Drug metabolism and disposition: the biological fate of chemicals.

[23]  G. V. Paolini,et al.  Global mapping of pharmacological space , 2006, Nature Biotechnology.

[24]  Lazaros Mavridis,et al.  Detecting Drug Promiscuity Using Gaussian Ensemble Screening , 2012, J. Chem. Inf. Model..

[25]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[26]  Andreas Bender,et al.  Chemoinformatics-Based Classification of Prohibited Substances Employed for Doping in Sport , 2006, J. Chem. Inf. Model..

[27]  David S. Wishart,et al.  DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs , 2010, Nucleic Acids Res..

[28]  Xin Wen,et al.  BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities , 2006, Nucleic Acids Res..

[29]  Claudiu T. Supuran,et al.  Carbonic anhydrase inhibitors and their therapeutic potential , 2000 .

[30]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[31]  Michael J. Keiser,et al.  Large Scale Prediction and Testing of Drug Activity on Side-Effect Targets , 2012, Nature.

[32]  H. Breuer,et al.  Steroid glucuronyltransferases of rat liver. Properties of oestrone and testosterone glucuronyltransferases and the effect of ovariectomy, castration and administration of steroids on the enzymes. , 1977, The Biochemical journal.

[33]  L. Henderson,et al.  Anabolic steroids induce region- and subunit-specific rapid modulation of GABA(A) receptor-mediated currents in the rat forebrain. , 2000, Journal of neurophysiology.

[34]  T. Svensson,et al.  Partial 5-HT1A receptor agonist properties of (–)pindolol in combination with citalopram on serotonergic dorsal raphe cell firing in vivo , 2000, Psychopharmacology.

[35]  Lazaros Mavridis,et al.  PFClust: a novel parameter free clustering algorithm , 2013, BMC Bioinformatics.

[36]  Renxiao Wang,et al.  The PDBbind database: methodologies and updates. , 2005, Journal of medicinal chemistry.

[37]  T Niyonsenga,et al.  Risk factors for encephalopathy and mortality during melarsoprol treatment of Trypanosoma brucei gambiense sleeping sickness. , 1995, Transactions of the Royal Society of Tropical Medicine and Hygiene.

[38]  Andreas Bender,et al.  Ligand-Target Prediction Using Winnow and Naive Bayesian Algorithms and the Implications of Overall Performance Statistics , 2008, J. Chem. Inf. Model..

[39]  Michael J. Keiser,et al.  Predicting new molecular targets for known drugs , 2009, Nature.