ToxicBlend: Virtual Screening of Toxic Compounds with Ensemble Predictors

Timely assessment of compound toxicity is one of the biggest challenges facing the pharmaceutical industry today. A significant proportion of compounds identified as potential leads are ultimately discarded due to the toxicity they induce. In this paper, we propose a novel machine learning approach for the prediction of molecular activity on ToxCast targets. We combine extreme gradient boosting with fully-connected and graph-convolutional neural network architectures trained on QSAR physical molecular property descriptors, PubChem molecular fingerprints, and SMILES sequences. Our ensemble predictor leverages the strengths of each individual technique, significantly outperforming existing state-of-the art models on the ToxCast and Tox21 toxicity-prediction datasets. We provide free access to molecule toxicity prediction using our model at this http URL.

[1]  Jun Huan,et al.  Predictive Toxicology: Modeling Chemical Induced Toxicological Response Combining Circular Fingerprints with Random Forest and Support Vector Machine , 2016, Front. Environ. Sci..

[2]  Navdeep Jaitly,et al.  Multi-task Neural Networks for QSAR Predictions , 2014, ArXiv.

[3]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[4]  Schmid,et al.  "Scaffold-Hopping" by Topological Pharmacophore Search: A Contribution to Virtual Screening. , 1999, Angewandte Chemie.

[5]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[6]  I. Tetko,et al.  Predicting the pKa of Small Molecules , 2011 .

[7]  Angela N. Brooks,et al.  A Next Generation Connectivity Map: L1000 Platform And The First 1,000,000 Profiles , 2017 .

[8]  Andreas Mayr,et al.  Deep Learning as an Opportunity in Virtual Screening , 2015 .

[9]  Jean-Philippe Vert,et al.  The Pharmacophore Kernel for Virtual Screening with Support Vector Machines , 2006, J. Chem. Inf. Model..

[10]  Ruili Huang,et al.  Tox21Challenge to Build Predictive Models of Nuclear Receptor and Stress Response Pathways as Mediated by Exposure to Environmental Chemicals and Drugs , 2016, Front. Environ. Sci..

[11]  Manfred Kansy,et al.  High throughput solubility measurement in drug discovery and development. , 2007, Advanced drug delivery reviews.

[12]  Pierre Baldi,et al.  Graph kernels for chemical informatics , 2005, Neural Networks.

[13]  Jordi Mestres,et al.  SHED: Shannon Entropy Descriptors from Topological Feature Distributions , 2006, J. Chem. Inf. Model..

[14]  Andreas Zell,et al.  jCompoundMapper: An open source Java library and command-line tool for chemical fingerprints , 2011, J. Cheminformatics.

[15]  Vijay S. Pande,et al.  Massively Multitask Networks for Drug Discovery , 2015, ArXiv.

[16]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[17]  Robert C. Glen,et al.  Random Forest Models To Predict Aqueous Solubility , 2007, J. Chem. Inf. Model..

[18]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[19]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[20]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[21]  Vijay S. Pande,et al.  Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity , 2017, ArXiv.

[22]  Ruili Huang,et al.  Editorial: Tox21 Challenge to Build Predictive Models of Nuclear Receptor and Stress Response Pathways As Mediated by Exposure to Environmental Toxicants and Drugs , 2017, Front. Environ. Sci..

[23]  Izhar Wallach,et al.  AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery , 2015, ArXiv.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  David Ryan Koes,et al.  Lessons Learned in Empirical Scoring with smina from the CSAR 2011 Benchmarking Exercise , 2013, J. Chem. Inf. Model..

[26]  Günter Klambauer,et al.  DeepTox: Toxicity Prediction using Deep Learning , 2016, Front. Environ. Sci..

[27]  R. Glen,et al.  Screening for Dihydrofolate Reductase Inhibitors Using MOLPRINT 2D, a Fast Fragment-Based Method Employing the Naïve Bayesian Classifier: Limitations of the Descriptor and the Importance of Balanced Chemistry in Training and Test Sets , 2005, Journal of biomolecular screening.

[28]  Stephen J. Capuzzi,et al.  QSAR Modeling of Tox21 Challenge Stress Response and Nuclear Receptor Signaling Toxicity Assays , 2016, Front. Environ. Sci..

[29]  A. Schlessinger,et al.  Inhibitor Discovery for the Human GLUT1 from Homology Modeling and Virtual Screening. , 2016, ACS chemical biology.

[30]  Marlene T. Kim,et al.  Predictive Modeling of Estrogen Receptor Binding Agents Using Advanced Cheminformatics Tools and Massive Public Data , 2016, Front. Environ. Sci..

[31]  Mathias Dunkel,et al.  Molecular similarity-based predictions of the Tox21 screening outcome , 2015, Front. Environ. Sci..

[32]  Joseph Gomes,et al.  MoleculeNet: a benchmark for molecular machine learning† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02664a , 2017, Chemical science.

[33]  Björn Krüger,et al.  The holistic integration of virtual screening in drug discovery. , 2013, Drug discovery today.

[34]  Filip Stefaniak,et al.  Prediction of Compounds Activity in Nuclear Receptor Signaling and Stress Pathway Assays Using Machine Learning Algorithms and Low-Dimensional Molecular Descriptors , 2015, Front. Environ. Sci..

[35]  Greg Ridgeway,et al.  Generalized Boosted Models: A guide to the gbm package , 2006 .

[36]  Pierre Baldi,et al.  Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like Molecules , 2013, J. Chem. Inf. Model..

[37]  A. Bender,et al.  Circular fingerprints: flexible molecular descriptors with applications from physical chemistry to ADME. , 2006, IDrugs : the investigational drugs journal.

[38]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[39]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[40]  Abhinav Vishnu,et al.  Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expert-developed QSAR/QSPR Models , 2017, ArXiv.