eToxPred: a machine learning-based approach to estimate the toxicity of drug candidates

BackgroundThe efficiency of drug development defined as a number of successfully launched new pharmaceuticals normalized by financial investments has significantly declined. Nonetheless, recent advances in high-throughput experimental techniques and computational modeling promise reductions in the costs and development times required to bring new drugs to market. The prediction of toxicity of drug candidates is one of the important components of modern drug discovery.ResultsIn this work, we describe eToxPred, a new approach to reliably estimate the toxicity and synthetic accessibility of small organic compounds. eToxPred employs machine learning algorithms trained on molecular fingerprints to evaluate drug candidates. The performance is assessed against multiple datasets containing known drugs, potentially hazardous chemicals, natural products, and synthetic bioactive compounds. Encouragingly, eToxPred predicts the synthetic accessibility with the mean square error of only 4% and the toxicity with the accuracy of as high as 72%.ConclusionseToxPred can be incorporated into protocols to construct custom libraries for virtual screening in order to filter out those drug candidates that are potentially toxic or would be difficult to synthesize. It is freely available as a stand-alone software at https://github.com/pulimeng/etoxpred.

[1]  Scott Bowes,et al.  Successful shape-based virtual screening: the discovery of a potent inhibitor of the type I TGFbeta receptor kinase (TbetaRI). , 2003, Bioorganic & medicinal chemistry letters.

[2]  Xavier Barril,et al.  rDock: A Fast, Versatile and Open Source Program for Docking Ligands to Proteins and Nucleic Acids , 2014, PLoS Comput. Biol..

[3]  Michael M. Mysinger,et al.  Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking , 2012, Journal of medicinal chemistry.

[4]  Shuxing Zhang,et al.  Computational polypharmacology: a new paradigm for drug discovery , 2017, Expert opinion on drug discovery.

[5]  J. T. Njardarson,et al.  Analysis of the structural diversity, substitution patterns, and frequency of nitrogen heterocycles among U.S. FDA approved pharmaceuticals. , 2014, Journal of medicinal chemistry.

[6]  Hong Liu,et al.  Fluorine in pharmaceutical industry: fluorine-containing drugs introduced to the market in the last decade (2001-2011). , 2014, Chemical reviews.

[7]  K. Pearson VII. Note on regression and inheritance in the case of two parents , 1895, Proceedings of the Royal Society of London.

[8]  R. W. Hansen,et al.  Journal of Health Economics , 2016 .

[9]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[12]  Che-Lun Hung,et al.  Computational Approaches for Drug Discovery , 2014, Drug development research.

[13]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[14]  Nam Doo Kim,et al.  Pharmacophore-based virtual screening: a review of recent applications , 2010, Expert opinion on drug discovery.

[15]  G. Sandford,et al.  Targeted Fluorination of a Nonsteroidal Anti‐inflammatory Drug to Prolong Metabolic Half‐Life , 2014, ChemMedChem.

[16]  Supratik Mukhopadhyay,et al.  A graph-based approach to construct target-focused libraries for virtual screening , 2016, Journal of Cheminformatics.

[17]  C. Chuaqui,et al.  Successful shape-Based virtual screening: The discovery of a potent inhibitor of the type I TGFβ receptor kinase (TβRI) , 2003 .

[18]  W. Tong,et al.  Quantitative structure‐activity relationship methods: Perspectives on drug discovery and toxicology , 2003, Environmental toxicology and chemistry.

[19]  Calvin Yu-Chian Chen,et al.  TCM Database@Taiwan: The World's Largest Traditional Chinese Medicine Database for Drug Screening In Silico , 2011, PloS one.

[20]  Ulrike Schmidt,et al.  SuperToxic: a comprehensive database of toxic compounds , 2008, Nucleic Acids Res..

[21]  E. Zeiger,et al.  The Ames Salmonella/microsome mutagenicity assay. , 2000, Mutation research.

[22]  이상헌,et al.  Deep Belief Networks , 2010, Encyclopedia of Machine Learning.

[23]  C. White,et al.  Prasugrel: A Critical Comparison with Clopidogrel , 2009, Pharmacotherapy.

[24]  David S. Wishart,et al.  T3DB: a comprehensively annotated database of common toxins and their targets , 2009, Nucleic Acids Res..

[25]  A. Lowenthal The European Stroke Prevention Study , 1988, Acta neurologica Belgica.

[26]  A. Cavalli,et al.  Role of Molecular Dynamics and Related Methods in Drug Discovery. , 2016, Journal of medicinal chemistry.

[27]  A. Anderson The process of structure-based drug design. , 2003, Chemistry & biology.

[28]  M. Rarey,et al.  FlexX‐Scan: Fast, structure‐based virtual screening , 2004, Proteins.

[29]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[30]  P Wexler,et al.  TOXNET: the National Library of Medicine's toxicology database. , 1995, American family physician.

[31]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[32]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[34]  Lupei Du,et al.  The interactions between hERG potassium channel and blockers. , 2009, Current topics in medicinal chemistry.

[35]  Alexander D. MacKerell,et al.  Recent advances in ligand-based drug design: relevance and utility of the conformationally sampled pharmacophore approach. , 2011, Current computer-aided drug design.

[36]  Carolina H Andrade,et al.  Assessing the performance of 3D pharmacophore models in virtual screening: how good are they? , 2013, Current topics in medicinal chemistry.

[37]  Andreas Zell,et al.  Interpreting linear support vector machine models with heat map molecule coloring , 2011, J. Cheminformatics.

[38]  Xiao Li,et al.  In Silico Prediction of Chemical Acute Oral Toxicity Using Multi-Classification Methods , 2014, J. Chem. Inf. Model..

[39]  Noriyuki Furuichi,et al.  Fluorine Scanning by Nonselective Fluorination: Enhancing Raf/MEK Inhibition while Keeping Physicochemical Properties. , 2013, ACS medicinal chemistry letters.

[40]  Johannes H. Voigt,et al.  Comparison of the NCI Open Database with Seven Large Chemical Structural Databases , 2001, J. Chem. Inf. Comput. Sci..

[41]  Tae-Hee Kim,et al.  Molecular mechanism(s) of endocrine-disrupting chemicals and their potent oestrogenicity in diverse cells and tissues that express oestrogen receptors , 2012, Journal of cellular and molecular medicine.

[42]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[43]  Vladimir B Bajic,et al.  In silico toxicology: computational methods for the prediction of chemical toxicity , 2016, Wiley interdisciplinary reviews. Computational molecular science.

[44]  S. Purser,et al.  Fluorine in medicinal chemistry. , 2008, Chemical Society reviews.

[45]  Susumu Goto,et al.  KEGG for representation and analysis of molecular networks involving diseases and drugs , 2009, Nucleic Acids Res..

[46]  Edward W. Lowe,et al.  Computational Methods in Drug Discovery , 2014, Pharmacological Reviews.

[47]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[48]  Richard D. Taylor,et al.  Rings in drugs. , 2014, Journal of medicinal chemistry.

[49]  G. Morris,et al.  Molecular docking. , 2008, Methods in molecular biology.

[50]  Adriano D Andricopulo,et al.  Development of a natural products database from the biodiversity of Brazil. , 2013, Journal of natural products.

[51]  Diogo Santos-Martins,et al.  Receptor-based virtual screening protocol for drug discovery. , 2015, Archives of biochemistry and biophysics.

[52]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[53]  S. Swallow,et al.  Fluorine in medicinal chemistry. , 2015, Progress in medicinal chemistry.

[54]  Florent Chevillard,et al.  SCUBIDOO: A Large yet Screenable and Easily Searchable Database of Computationally Created Chemical Compounds Optimized toward High Likelihood of Synthetic Tractability , 2015, J. Chem. Inf. Model..

[55]  M. Vieth,et al.  Synthesis and activity of new aryl- and heteroaryl-substituted pyrazole inhibitors of the transforming growth factor-beta type I receptor kinase domain. , 2003, Journal of medicinal chemistry.

[56]  R. Vardanyan Classes of Piperidine-Based Drugs , 2018 .

[57]  Richard L. Smith,et al.  PREDICTIVE INFERENCE , 2004 .

[58]  Jeremy D. Cohen,et al.  Remifentanil , 2001, Reactions Weekly.

[59]  Jin-Tai Yu,et al.  Efficacy and safety of donepezil, galantamine, rivastigmine, and memantine for the treatment of Alzheimer's disease: a systematic review and meta-analysis. , 2014, Journal of Alzheimer's disease : JAD.

[60]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[61]  B. Ames,et al.  What do animal cancer tests tell us about human cancer risk?: Overview of analyses of the carcinogenic potency database. , 1998, Drug metabolism reviews.

[62]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[63]  M. Sanguinetti,et al.  hERG potassium channels and cardiac arrhythmia , 2006, Nature.

[64]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[65]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[66]  Peter Ertl,et al.  Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions , 2009, J. Cheminformatics.

[67]  Matthew P. Repasky,et al.  Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. , 2004, Journal of medicinal chemistry.

[68]  F. Borsini,et al.  Pharmacology of flibanserin. , 2006, CNS drug reviews.

[69]  C. A. Glasbey,et al.  Discriminant Analysis and Statistical Pattern Recognition.@@@Fundamentals of Pattern Recognition. , 1994 .

[70]  Jian-Guo Jiang,et al.  Typical toxic components in traditional Chinese medicine , 2012, Expert opinion on drug safety.

[71]  H. Davis,et al.  In vivo metabolism-based discovery of a potent cholesterol absorption inhibitor, SCH58235, in the rat and rhesus monkey through the identification of the active metabolites of SCH48461. , 1997, The Journal of pharmacology and experimental therapeutics.

[72]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[73]  M. Ondetti,et al.  History of the Design of Captopril and Related Inhibitors of Angiotensin Converting Enzyme , 1991, Hypertension.

[74]  Mathias Dunkel,et al.  ProTox: a web server for the in silico prediction of rodent oral toxicity , 2014, Nucleic Acids Res..

[75]  Günter Klambauer,et al.  DeepTox: Toxicity Prediction using Deep Learning , 2016, Front. Environ. Sci..

[76]  Gisbert Schneider,et al.  Computer-based de novo design of drug-like molecules , 2005, Nature Reviews Drug Discovery.

[77]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[78]  Tingjun Hou,et al.  ADME evaluation in drug discovery , 2002, Journal of molecular modeling.

[79]  E. Zeiger,et al.  Handbook of Carcinogenic Potency and Genotoxicity Databases , 1996 .

[80]  L. Forrest,et al.  Mechanism of Paroxetine (Paxil) Inhibition of the Serotonin Transporter , 2016, Scientific Reports.

[81]  Charles C. Persinger,et al.  How to improve R&D productivity: the pharmaceutical industry's grand challenge , 2010, Nature Reviews Drug Discovery.

[82]  Supratik Mukhopadhyay,et al.  Break Down in Order To Build Up: Decomposing Small Molecules for Fragment-Based Drug Design with eMolFrag , 2017, J. Chem. Inf. Model..

[83]  M. Akhter,et al.  Piperazine scaffold: A remarkable tool in generation of diverse pharmacological agents. , 2015, European journal of medicinal chemistry.

[84]  Lirong Chen,et al.  Use of Natural Products as Chemical Library for Drug Discovery and Network Pharmacology , 2013, PloS one.

[85]  Jeffrey G. Mandell,et al.  Fast Molecular Docking Methods , 1998 .

[86]  H. Diener,et al.  European Stroke Prevention Study 2. Dipyridamole and acetylsalicylic acid in the secondary prevention of stroke 1 1 ESPS-2 Writing Committee , 1996, Journal of the Neurological Sciences.

[87]  Youyong Li,et al.  ADMET evaluation in drug discovery. 12. Development of binary classification models for prediction of hERG potassium channel blockage. , 2012, Molecular pharmaceutics.

[88]  Sheng-Yong Yang,et al.  Pharmacophore modeling and applications in drug discovery: challenges and recent advances. , 2010, Drug discovery today.

[89]  R. Dinapoli,et al.  3-O-Methyldopa, L-dopa, and trihexyphenidyl in the treatment of Parkinson's disease. , 1973, Mayo Clinic proceedings.

[90]  Alexander Tropsha,et al.  Pred‐hERG: A Novel web‐Accessible Computational Tool for Predicting Cardiac Toxicity , 2015, Molecular informatics.