Learning Drug Functions from Chemical Structures with Convolutional Neural Networks and Random Forests

Empirical testing of chemicals for drug efficacy costs many billions of dollars every year. The ability to predict the action of molecules in silico would greatly increase the speed and decrease the cost of prioritizing drug leads. Here, we asked whether drug function, defined as MeSH “therapeutic use” classes, can be predicted from only a chemical structure. We evaluated two chemical-structure-derived drug classification methods, chemical images with convolutional neural networks and molecular fingerprints with random forests, both of which outperformed previous predictions that used drug-induced transcriptomic changes as chemical representations. This suggests that the structure of a chemical contains at least as much information about its therapeutic use as the transcriptional cellular response to that chemical. Furthermore, because training data based on chemical structure is not limited to a small set of molecules for which transcriptomic measurements are available, our strategy can leverage more training data to significantly improve predictive accuracy to 83–88%. Finally, we explore use of these models for prediction of side effects and drug-repurposing opportunities and demonstrate the effectiveness of this modeling strategy for multilabel classification.

[1]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[2]  J. Arch,et al.  Evaluation of the potassium channel activator cromakalim (BRL 34915) as a bronchodilator in the guinea‐pig: comparison with nifedipine , 1988, British journal of pharmacology.

[3]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[4]  P. Lavori,et al.  Anticholinergic Effects on Memory: Benztropine versus Amantadine , 1989, Journal of clinical psychopharmacology.

[5]  David Weininger,et al.  SMILES, 3. DEPICT. Graphical depiction of chemical structures , 1990, J. Chem. Inf. Comput. Sci..

[6]  H. Lowe,et al.  Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. , 1994, JAMA.

[7]  Luhua Lai,et al.  A New Atom-Additive Method for Calculating Partition Coefficients , 1997, J. Chem. Inf. Comput. Sci..

[8]  K. Bland,et al.  Estrogen-induced activation of Erk-1 and Erk-2 requires the G protein-coupled receptor homolog, GPR30, and occurs via trans-activation of the epidermal growth factor receptor through release of HB-EGF. , 2000, Molecular endocrinology.

[9]  J. Stephenson FDA Orders Estrogen Safety Warnings , 2003 .

[10]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[11]  A. Hopkins,et al.  Navigating chemical space for biology and medicine , 2004, Nature.

[12]  Nina Nikolova-Jeliazkova,et al.  QSAR Applicability Domain Estimation by Projection of the Training Set in Descriptor Space: A Review , 2005, Alternatives to laboratory animals : ATLA.

[13]  Igor V. Tetko,et al.  Critical Assessment of QSAR Models of Environmental Toxicity against Tetrahymena pyriformis: Focusing on Applicability Domain and Overfitting by Variable Selection , 2008, J. Chem. Inf. Model..

[14]  G. Beauchamp,et al.  Time for some a priori thinking about post hoc testing , 2008 .

[15]  Stefan Günther,et al.  SuperPred: drug classification and target prediction , 2008, Nucleic Acids Res..

[16]  Martin Hofmann-Apitius,et al.  Concept-Based Semi-Automatic Classification of Drugs , 2009, J. Chem. Inf. Model..

[17]  Melvin E Andersen,et al.  Toxicity testing in the 21st century: bringing the vision to life. , 2009, Toxicological sciences : an official journal of the Society of Toxicology.

[18]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[19]  Jean-Louis Reymond,et al.  Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17 , 2012, J. Chem. Inf. Model..

[20]  K. Chou,et al.  Predicting Anatomical Therapeutic Chemical (ATC) Classification of Drugs by Integrating Chemical-Chemical Interactions and Similarities , 2012, PloS one.

[21]  Alexandre Varnek,et al.  Estimation of the size of drug-like chemical space based on GDB-17 data , 2013, Journal of Computer-Aided Molecular Design.

[22]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[23]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[24]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[25]  Sergey Plis,et al.  Deep Learning Applications for Predicting Pharmacological Properties of Drugs and Drug Repurposing Using Transcriptomic Data. , 2016, Molecular pharmaceutics.

[26]  Jing Lu,et al.  ChemTreeMap: an interactive map of biochemical similarity in molecular datasets , 2016, Bioinform..

[27]  Phillip M. Cheng,et al.  Transfer Learning with Convolutional Neural Networks for Classification of Abdominal Ultrasound Images , 2017, Journal of Digital Imaging.

[28]  Andreas Verras,et al.  Is Multitask Deep Learning Practical for Pharma? , 2017, J. Chem. Inf. Model..

[29]  Angela N. Brooks,et al.  A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles , 2017, Cell.

[30]  Vijay S. Pande,et al.  Low Data Drug Discovery with One-Shot Learning , 2016, ACS central science.

[31]  Izhar Wallach,et al.  Most Ligand-Based Benchmarks Measure Overfitting Rather than Accuracy , 2017, J. Chem. Inf. Model..

[32]  George Papadatos,et al.  Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set , 2017, bioRxiv.

[33]  Jianfeng Pei,et al.  Deep Learning Based Regression and Multiclass Models for Acute Oral Toxicity Prediction with Automatic Chemical Feature Extraction , 2017, J. Chem. Inf. Model..

[34]  Abhinav Vishnu,et al.  Deep learning for computational chemistry , 2017, J. Comput. Chem..

[35]  Jie Min,et al.  Small Molecule Accurate Recognition Technology (SMART) to Enhance Natural Products Research , 2017, Scientific Reports.

[36]  Sean Ekins,et al.  Comparison of Deep Learning With Multiple Machine Learning Methods and Metrics Using Diverse Drug Discovery Data Sets. , 2017, Molecular pharmaceutics.

[37]  W. Gerwick The Face of a Molecule. , 2017, Journal of natural products.

[38]  Anne E Carpenter,et al.  Repurposing High-Throughput Image Assays Enables Biological Activity Prediction for Drug Discovery. , 2018, Cell chemical biology.

[39]  Alán Aspuru-Guzik,et al.  Inverse molecular design using machine learning: Generative models for matter engineering , 2018, Science.

[40]  Volkan Atalay,et al.  DEEPScreen: high performance drug–target interaction prediction with convolutional neural networks using 2-D structural compound representations , 2018, bioRxiv.

[41]  Olexandr Isayev,et al.  Deep reinforcement learning for de novo drug design , 2017, Science Advances.

[42]  Loris Nanni,et al.  Convolutional Neural Networks for ATC Classification. , 2019, Current pharmaceutical design.

[43]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[44]  Thomas Blaschke,et al.  The rise of deep learning in drug discovery. , 2018, Drug discovery today.

[45]  Anne E Carpenter,et al.  Opportunities and obstacles for deep learning in biology and medicine , 2017, bioRxiv.

[46]  Artem Cherkasov,et al.  Toxic Colors: The Use of Deep Learning for Predicting Toxicity of Compounds Merely from Their Graphic Images , 2018, J. Chem. Inf. Model..

[47]  Anthony Gitter,et al.  Practical Model Selection for Prospective Virtual Screening , 2018, bioRxiv.

[48]  Djork-Arné Clevert,et al.  De novo generation of hit-like molecules from gene expression signatures using artificial intelligence , 2020, Nature Communications.