Artificial intelligence in chemistry and drug design

The discovery of molecular structures with desired properties for applications in drug discovery, crop protection, or chemical biology is among the most impactful scientific challenges. However, given the complexity of biological systems and the associated cost for experiments and trials, molecular design is also scientifically very challenging, prone to failure, inherently expensive and time consuming [1, 2]. To improve our odds and the timelines in this process, and to identify good starting points, unbiased incorporation of knowledge through continuous analysis of literature and patents from different scientific fields is required [3]. The number of yearly publications is increasing, and a good collaboration between scientific experts across disciplines is required to fully evaluate the potential of a hypothesis. The theoretical space of chemistry, even when limited by molecular size, is huge [4] and dramatically exceeds what we can assess experimentally and even computationally. How to navigate through it efficiently and select molecules that satisfy the multiple parameters that need to be optimized and that are synthetically accessible [5]? The number of existing data points at the beginning of a project are low. How can we enrich projects in short time frames with informative molecules and data that are subsequently used to drive the design? With these questions in mind, it comes as no surprise that data mining and statistics have been integrated into molecular discovery and design pipelines to provide computational support in the prioritization of molecular hypotheses [6, 7]. Machine learning algorithms have been part of the routine toolbox of computational and medicinal chemists for decades. The recent increase in applications and coverage of these methodologies has been attributed to advances in computational power, the growing amount of digitized research data, and an increasing theoretical understanding of the algorithms and their shortcomings. However, given the gradual character of these evolutions, it might be counterintuitive to expect a dramatic revolution of molecular design. Nevertheless, extravagant claims have been made for the ability of Artificial Intelligence (AI) to accelerate the design process [8, 9]; how well founded are these claims? While there is unquestionably a lot of potential in novel computational tools, it is important to scrutinize them and compare their performance to already existing methods, to objectively distinguish real progress from promotion. Only such careful evaluations will enable us to shed light on whether novel artificial intelligence methods contribute to an evolution or a revolution of the established scientific discipline of computer-assisted molecular design [10].

[1]  Denis Fourches,et al.  Exploring drug space with ChemMaps.com , 2018, Bioinform..

[2]  Klaus-Robert Müller,et al.  Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models , 2017, ArXiv.

[3]  Regina Barzilay,et al.  Analyzing Learned Molecular Representations for Property Prediction , 2019, J. Chem. Inf. Model..

[4]  Atul Prakash,et al.  Robust Physical-World Attacks on Deep Learning Visual Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Robert C. Glen,et al.  Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction , 2019, Journal of Computer-Aided Molecular Design.

[6]  Michael J. Keiser,et al.  Adversarial Controls for Scientific Machine Learning. , 2018, ACS chemical biology.

[7]  Vijay S. Pande,et al.  MoleculeNet: a benchmark for molecular machine learning , 2017, Chemical science.

[8]  Floriane Montanari,et al.  Modeling Physico-Chemical ADMET Endpoints with Multitask Graph Convolutional Networks , 2019, Molecules.

[9]  J. Reymond The chemical space project. , 2015, Accounts of chemical research.

[10]  Daniel M. Lowe Extraction of chemical structures and reactions from the literature , 2012 .

[11]  Gisbert Schneider,et al.  Automating drug discovery , 2017, Nature Reviews Drug Discovery.

[12]  Mike Preuss,et al.  Planning chemical syntheses with deep neural networks and symbolic AI , 2017, Nature.

[13]  Vijay S. Pande,et al.  Low Data Drug Discovery with One-Shot Learning , 2016, ACS central science.

[14]  H. Kubinyi QSAR : Hansch analysis and related approaches , 1993 .

[15]  Talia B. Kimber,et al.  Revealing cytotoxic substructures in molecules using deep learning , 2020, Journal of Computer-Aided Molecular Design.

[16]  K. Friedemann Schmidt,et al.  Predictive Multitask Deep Neural Network Models for ADME-Tox Properties: Learning from Large Data Sets , 2019, J. Chem. Inf. Model..

[17]  Djork-Arné Clevert,et al.  De novo generation of hit-like molecules from gene expression signatures using artificial intelligence , 2020, Nature Communications.

[18]  Anthony D. Keefe,et al.  DNA-encoded chemistry: enabling the deeper sampling of chemical space , 2016, Nature Reviews Drug Discovery.

[19]  Douglas Heaven,et al.  Why deep-learning AIs are so easy to fool , 2019, Nature.

[20]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[21]  I. Kola,et al.  Can the pharmaceutical industry reduce attrition rates? , 2004, Nature Reviews Drug Discovery.

[22]  Andreas Verras,et al.  Is Multitask Deep Learning Practical for Pharma? , 2017, J. Chem. Inf. Model..

[23]  Alekseĭ Grigorʹevich Ivakhnenko,et al.  Cybernetics and forecasting techniques , 1967 .

[24]  Yadi Zhou,et al.  Exploring Tunable Hyperparameters for Deep Neural Networks with Industrial ADME Data Sets. , 2018, Journal of chemical information and modeling.

[25]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[26]  Andrew R. Leach,et al.  ChEMBL: towards direct deposition of bioassay data , 2018, Nucleic Acids Res..

[27]  Valerie J. Gillet,et al.  Knowledge-Based Approach to de Novo Design Using Reaction Vectors , 2009, J. Chem. Inf. Model..

[28]  Úlfar Erlingsson,et al.  The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets , 2018, ArXiv.

[29]  Igor V. Tetko,et al.  Focused Library Generator: case of Mdmx inhibitors , 2019, Journal of Computer-Aided Molecular Design.

[30]  Andy Liaw,et al.  Extreme Gradient Boosting as a Method for Quantitative Structure-Activity Relationships , 2016, J. Chem. Inf. Model..

[31]  Marwin H. S. Segler,et al.  GuacaMol: Benchmarking Models for De Novo Molecular Design , 2018, J. Chem. Inf. Model..

[32]  Asher Mullard New drugs cost US$2.6 billion to develop , 2014, Nature Reviews Drug Discovery.

[33]  Frank Noé,et al.  Learning Continuous and Data-Driven Molecular Descriptors by Translating Equivalent Chemical Representations , 2018 .

[34]  Dimitar Hristozov,et al.  Enhancing reaction-based de novo design using a multi-label reaction class recommender , 2020, Journal of Computer-Aided Molecular Design.

[35]  Marcus Gastreich,et al.  The next level in chemical space navigation: going far beyond enumerable compound libraries. , 2019, Drug discovery today.

[36]  Chris Morrison AI developers tout revolution, drugmakers talk evolution. , 2019, Nature biotechnology.

[37]  A. Filipa de Almeida,et al.  Synthetic organic chemistry driven by artificial intelligence , 2019, Nature Reviews Chemistry.

[38]  Sebastian Raschka,et al.  Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning , 2018, ArXiv.

[39]  Alán Aspuru-Guzik,et al.  Deep learning enables rapid identification of potent DDR1 kinase inhibitors , 2019, Nature Biotechnology.

[40]  Regina Barzilay,et al.  Are Learned Molecular Representations Ready For Prime Time? , 2019, ArXiv.

[41]  C. Hansch,et al.  p-σ-π Analysis. A Method for the Correlation of Biological Activity and Chemical Structure , 1964 .

[42]  Richard A. Lewis A general method for exploiting QSAR models in lead optimization. , 2005, Journal of medicinal chemistry.

[43]  Connor W. Coley,et al.  A graph-convolutional neural network model for the prediction of chemical reactivity , 2018, Chemical science.

[44]  Wojciech Samek,et al.  Explainable AI: Interpreting, Explaining and Visualizing Deep Learning , 2019, Explainable AI.

[45]  Bin Li,et al.  Applications of machine learning in drug discovery and development , 2019, Nature Reviews Drug Discovery.

[46]  Evan Bolton,et al.  PubChem 2019 update: improved access to chemical data , 2018, Nucleic Acids Res..

[47]  Daniel C. Elton,et al.  Deep learning for molecular generation and optimization - a review of the state of the art , 2019, Molecular Systems Design & Engineering.

[48]  Berkman Sahiner,et al.  Evaluation of data augmentation via synthetic images for improved breast mass detection on mammograms using deep learning , 2019, Journal of medical imaging.

[49]  Andy Liaw,et al.  Demystifying Multitask Deep Neural Networks for Quantitative Structure-Activity Relationships , 2017, J. Chem. Inf. Model..

[50]  D. Sculley,et al.  Hidden Technical Debt in Machine Learning Systems , 2015, NIPS.

[51]  Matthias Rarey,et al.  In Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual Screening , 2019, J. Chem. Inf. Model..

[52]  Alán Aspuru-Guzik,et al.  Optimizing distributions over molecular space. An Objective-Reinforced Generative Adversarial Network for Inverse-design Chemistry (ORGANIC) , 2017 .

[53]  Emma J. Chory,et al.  A Deep Learning Approach to Antibiotic Discovery , 2020, Cell.

[54]  S. Free,et al.  A MATHEMATICAL CONTRIBUTION TO STRUCTURE-ACTIVITY STUDIES. , 1964, Journal of medicinal chemistry.

[55]  Robert P. Sheridan,et al.  Similarity to Molecules in the Training Set Is a Good Discriminator for Prediction Accuracy in QSAR , 2004, J. Chem. Inf. Model..

[56]  W. P. Walters,et al.  Virtual Chemical Libraries. , 2018, Journal of medicinal chemistry.

[57]  Shiguang Shan,et al.  Improving 2D Face Recognition via Discriminative Face Depth Estimation , 2018, 2018 International Conference on Biometrics (ICB).

[58]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[59]  David B. Searls,et al.  Data integration: challenges for drug discovery , 2005, Nature Reviews Drug Discovery.

[60]  Thomas Blaschke,et al.  The rise of deep learning in drug discovery. , 2018, Drug discovery today.

[61]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[62]  Igor I. Baskin,et al.  Machine Learning Methods for Property Prediction in Chemoinformatics: Quo Vadis? , 2012, J. Chem. Inf. Model..

[63]  Connor W. Coley,et al.  Machine Learning in Computer-Aided Synthesis Planning. , 2018, Accounts of chemical research.

[64]  Ulrike Holzgrabe QSAR: Hansch Analysis and Related Approaches, H. Kubiny, VCH, Weinheim 1993. 232 Seiten, 60 Abb. und 32 Tab. 158,– DM. ISBN 3‐527‐30035‐X , 1994 .

[65]  Thomas Blaschke,et al.  Molecular de-novo design through deep reinforcement learning , 2017, Journal of Cheminformatics.

[66]  Dragos Horvath,et al.  Diversifying chemical libraries with generative topographic mapping , 2019, Journal of Computer-Aided Molecular Design.

[67]  Eric J. Martin,et al.  In silico generation of novel, drug-like chemical matter using the LSTM neural network , 2017, ArXiv.

[68]  Andreas Zell,et al.  Estimation of the applicability domain of kernel-based machine learning models for virtual screening , 2010, J. Cheminformatics.

[69]  Stefan Senger,et al.  Correction to: BRADSHAW: a system for automated molecular design , 2019, Journal of Computer-Aided Molecular Design.

[70]  Johannes H. Voigt,et al.  Comparison of the NCI Open Database with Seven Large Chemical Structural Databases , 2001, J. Chem. Inf. Comput. Sci..

[71]  Alán Aspuru-Guzik,et al.  Inverse molecular design using machine learning: Generative models for matter engineering , 2018, Science.

[72]  Jean-Louis Reymond,et al.  Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17 , 2012, J. Chem. Inf. Model..

[73]  Izhar Wallach,et al.  Most Ligand-Based Benchmarks Measure Overfitting Rather than Accuracy , 2017, J. Chem. Inf. Model..

[74]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[75]  Alexander Binder,et al.  Unmasking Clever Hans predictors and assessing what machines really learn , 2019, Nature Communications.

[76]  K. Wanner,et al.  Methods and Principles in Medicinal Chemistry , 2007 .

[77]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[78]  Ola Engkvist,et al.  A de novo molecular generation method using latent vector based generative adversarial network , 2019, J. Cheminformatics.

[79]  Dimitar Hristozov,et al.  Validation of Reaction Vectors for de novo Design , 2012 .

[80]  P Schneider,et al.  Multi-objective active machine learning rapidly improves structure–activity models and reveals new protein–protein interaction inhibitors† †Electronic supplementary information (ESI) available: Details about computational comparisons and all screening results. See DOI: 10.1039/c5sc04272k , 2016, Chemical science.

[81]  A. Hopkins,et al.  Navigating chemical space for biology and medicine , 2004, Nature.

[82]  W Patrick Walters,et al.  Assessing the impact of generative AI on medicinal chemistry , 2020, Nature Biotechnology.

[83]  J. Dearden,et al.  How not to develop a quantitative structure–activity or structure–property relationship (QSAR/QSPR) , 2009, SAR and QSAR in environmental research.

[84]  Li Fei-Fei,et al.  ImageNet: Constructing a large-scale image database , 2010 .

[85]  Klaus-Robert Müller,et al.  iNNvestigate neural networks! , 2018, J. Mach. Learn. Res..

[86]  Robert P Sheridan,et al.  Interpretation of QSAR Models by Coloring Atoms According to Changes in Predicted Activity: How Robust Is It? , 2019, J. Chem. Inf. Model..

[87]  Ola Engkvist,et al.  Computational prediction of chemical reactions: current status and outlook. , 2018, Drug discovery today.

[88]  William H. Green,et al.  Using Machine Learning To Predict Suitable Conditions for Organic Reactions , 2018, ACS central science.

[89]  Hugo Ceulemans,et al.  Large-scale comparison of machine learning methods for drug target prediction on ChEMBL , 2018, Chemical science.

[90]  Gisbert Schneider,et al.  Active-learning strategies in computer-assisted drug discovery. , 2015, Drug discovery today.

[91]  G. Schneider,et al.  Rethinking drug design in the artificial intelligence era , 2019, Nature Reviews Drug Discovery.

[92]  Benedict W J Irwin,et al.  Imputation of Assay Bioactivity Data Using Deep Learning , 2019, J. Chem. Inf. Model..

[93]  Darren V. S. Green,et al.  BRADSHAW: a system for automated molecular design , 2019, Journal of Computer-Aided Molecular Design.

[94]  Wojciech Samek,et al.  Explainable ai – preface , 2019 .