GuacaMol: Benchmarking Models for De Novo Molecular Design

De novo design seeks to generate molecules with required property profiles by virtual design-make-test cycles. With the emergence of deep learning and neural generative models in many application areas, models for molecular design based on neural networks appeared recently and show promising results. However, the new models have not been profiled on consistent tasks, and comparative studies to well-established algorithms have only seldom been performed. To standardize the assessment of both classical and neural models for de novo molecular design, we propose an evaluation framework, GuacaMol, based on a suite of standardized benchmarks. The benchmark tasks encompass measuring the fidelity of the models to reproduce the property distribution of the training sets, the ability to generate novel molecules, the exploration and exploitation of chemical space, and a variety of single and multiobjective optimization tasks. The benchmarking open-source Python code and a leaderboard can be found on https://benevolent.ai/guacamol .

[1]  Gisbert Schneider,et al.  Tuning artificial intelligence on the de novo design of natural-product-inspired retinoid X receptor modulators , 2018, Communications Chemistry.

[2]  Richard E. Turner,et al.  Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control , 2016, ICML.

[3]  Hans-Joachim Böhm,et al.  The computer program LUDI: A new method for the de novo design of enzyme inhibitors , 1992, J. Comput. Aided Mol. Des..

[4]  Shahar Harel,et al.  Prototype-Based Compound Discovery Using Deep Generative Models. , 2018, Molecular pharmaceutics.

[5]  Mohamed Ahmed,et al.  Exploring Deep Recurrent Models with Reinforcement Learning for Molecule Design , 2018, ICLR.

[6]  Thomas Blaschke,et al.  Exploring the GDB-13 chemical space using deep generative models , 2018, Journal of Cheminformatics.

[7]  Alán Aspuru-Guzik,et al.  Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models , 2017, ArXiv.

[8]  Jan H. Jensen,et al.  A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space , 2018, Chemical science.

[9]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[11]  Florent Chevillard,et al.  SCUBIDOO: A Large yet Screenable and Easily Searchable Database of Computationally Created Chemical Compounds Optimized toward High Likelihood of Synthetic Tractability , 2015, J. Chem. Inf. Model..

[12]  Nicola De Cao,et al.  MolGAN: An implicit generative model for small molecular graphs , 2018, ArXiv.

[13]  Andrey Kazennov,et al.  The cornucopia of meaningful leads: Applying deep adversarial autoencoders for new molecule development in oncology , 2016, Oncotarget.

[14]  Jin Woo Kim,et al.  Molecular generative model based on conditional variational autoencoder for de novo molecular design , 2018, Journal of Cheminformatics.

[15]  Yibo Li,et al.  Multi-objective de novo drug design with conditional graph generative model , 2018, Journal of Cheminformatics.

[16]  G. Schneider,et al.  Enabling future drug discovery by de novo design , 2011 .

[17]  Niloy Ganguly,et al.  NeVAE: A Deep Generative Model for Molecular Graphs , 2018, AAAI.

[18]  C. Hansch,et al.  p-σ-π Analysis. A Method for the Correlation of Biological Activity and Chemical Structure , 1964 .

[19]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[20]  Sepp Hochreiter,et al.  Fréchet ChemNet Distance: A Metric for Generative Models for Molecules in Drug Discovery , 2018, J. Chem. Inf. Model..

[21]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[22]  Gisbert Schneider,et al.  De Novo Design of Bioactive Small Molecules by Artificial Intelligence , 2018, Molecular informatics.

[23]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[24]  Steven Skiena,et al.  Syntax-Directed Variational Autoencoder for Structured Data , 2018, ICLR.

[25]  Jean-Louis Reymond,et al.  Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17 , 2012, J. Chem. Inf. Model..

[26]  Qi Liu,et al.  Constrained Graph Variational Autoencoders for Molecule Design , 2018, NeurIPS.

[27]  Alán Aspuru-Guzik,et al.  Optimizing distributions over molecular space. An Objective-Reinforced Generative Adversarial Network for Inverse-design Chemistry (ORGANIC) , 2017 .

[28]  Jan H Jensen A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space† †Electronic supplementary information (ESI) available: The codes used in this study can be found on GitHub: github.com/jensengroup/GB-GA/tree/v0.0 and github.com/jensengroup/GB-GM/tree , 2019, Chemical science.

[29]  Alán Aspuru-Guzik,et al.  Reinforced Adversarial Neural Computer for de Novo Molecular Design , 2018, J. Chem. Inf. Model..

[30]  Holger Claussen,et al.  Second-generation de novo design: a view from a medicinal chemist perspective , 2009, J. Comput. Aided Mol. Des..

[31]  Niloy Ganguly,et al.  Designing Random Graph Models Using Variational Autoencoders With Applications to Chemical Design , 2018, ArXiv.

[32]  Mike Preuss,et al.  Planning chemical syntheses with deep neural networks and symbolic AI , 2017, Nature.

[33]  Eran Yahav,et al.  On the Practical Computational Power of Finite Precision RNNs for Language Recognition , 2018, ACL.

[34]  Evgeny Putin,et al.  Adversarial Threshold Neural Computer for Molecular de Novo Design. , 2018, Molecular pharmaceutics.

[35]  Peter Ertl,et al.  IADE: a system for intelligent automatic design of bioisosteric analogs , 2012, Journal of Computer-Aided Molecular Design.

[36]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[37]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[38]  Florent Chevillard,et al.  Binding-Site Compatible Fragment Growing Applied to the Design of β2-Adrenergic Receptor Ligands. , 2018, Journal of medicinal chemistry.

[39]  Günter Klambauer,et al.  DeepTox: Toxicity Prediction using Deep Learning , 2016, Front. Environ. Sci..

[40]  Gisbert Schneider De novo Molecular Design , 2013 .

[41]  Andrew R. Leach,et al.  ChEMBL: towards direct deposition of bioassay data , 2018, Nucleic Acids Res..

[42]  Nikos Komodakis,et al.  GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders , 2018, ICANN.

[43]  Constantinos S. Pattichis,et al.  De Novo Drug Design Using Multiobjective Evolutionary Graphs , 2009, J. Chem. Inf. Model..

[44]  Johann Gasteiger,et al.  A Graph-Based Genetic Algorithm and Its Application to the Multiobjective Evolution of Median Molecules , 2004, J. Chem. Inf. Model..

[45]  Mostapha Benhenda,et al.  ChemGAN challenge for drug discovery: can AI reproduce natural chemical diversity? , 2017, ArXiv.

[46]  Beat Ernst,et al.  Drug discovery today. , 2003, Current topics in medicinal chemistry.

[47]  Luhua Lai,et al.  LigBuilder: A Multi-Purpose Program for Structure-Based Drug Design , 2000 .

[48]  Michael S Lajiness,et al.  Assessment of the consistency of medicinal chemists in reviewing sets of compounds. , 2004, Journal of medicinal chemistry.

[49]  Matt J. Kusner,et al.  Learning a Generative Model for Validity in Complex Discrete Structures , 2017, ICLR.

[50]  Jure Leskovec,et al.  Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation , 2018, NeurIPS.

[51]  Robert Tibshirani,et al.  Chemical Space Mimicry for Drug Discovery , 2017, J. Chem. Inf. Model..

[52]  Hiroshi Kajino,et al.  Molecular Hypergraph Grammar with its Application to Molecular Optimization , 2018, ICML.

[53]  Sergey Nikolenko,et al.  druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico. , 2017, Molecular pharmaceutics.

[54]  E. L. Short,et al.  Quantum Chemistry , 1969, Nature.

[55]  Petra Schneider,et al.  Generative Recurrent Networks for De Novo Drug Design , 2017, Molecular informatics.

[56]  Jonathan Berant,et al.  Evaluating Text GANs as Language Models , 2018, NAACL.

[57]  Dmitry Vetrov,et al.  Entangled Conditional Adversarial Autoencoder for de Novo Drug Discovery. , 2018, Molecular pharmaceutics.

[58]  Valerie J. Gillet,et al.  SPROUT: Recent developments in the de novo design of molecules , 1994, J. Chem. Inf. Comput. Sci..

[59]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[60]  F. Lombardo,et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. , 2001, Advanced drug delivery reviews.

[61]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[62]  Herbert Waldmann,et al.  New Modalities for Challenging Targets in Drug Discovery. , 2017, Angewandte Chemie.

[63]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[64]  Razvan Pascanu,et al.  Learning Deep Generative Models of Graphs , 2018, ICLR 2018.

[65]  Yoshua Bengio,et al.  DEFactor: Differentiable Edge Factorization-based Probabilistic Graph Generation , 2018, ArXiv.

[66]  Olexandr Isayev,et al.  Deep reinforcement learning for de novo drug design , 2017, Science Advances.

[67]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[68]  Nathan Brown,et al.  Multi-objective optimization methods in drug design. , 2013, Drug discovery today. Technologies.

[69]  J. Gasteiger,et al.  Chemoinformatics: A Textbook , 2003 .

[70]  Johann Gasteiger,et al.  De novo design and synthetic accessibility , 2007, J. Comput. Aided Mol. Des..

[71]  Koji Tsuda,et al.  Population-based de novo molecule generation, using grammatical evolution , 2018, 1804.02134.

[72]  Roger A. Sayle,et al.  Get Your Atoms in Order - An Open-Source Implementation of a Novel and Robust Molecular Canonicalization Algorithm , 2015, J. Chem. Inf. Model..

[73]  Li Yibo,et al.  Designing natural product-like virtual libraries using deep molecule generative models , 2018, Journal Of Chinese Pharmaceutical Sciences.

[74]  Anabella Villalobos,et al.  Central Nervous System Multiparameter Optimization Desirability: Application in Drug Discovery. , 2016, ACS chemical neuroscience.

[75]  P. Selzer,et al.  Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. , 2000, Journal of medicinal chemistry.

[76]  W. P. Walters,et al.  Virtual Chemical Libraries. , 2018, Journal of medicinal chemistry.

[77]  Thomas Blaschke,et al.  The rise of deep learning in drug discovery. , 2018, Drug discovery today.

[78]  Esben Jannik Bjerrum,et al.  Molecular Generation with Recurrent Neural Networks (RNNs) , 2017, ArXiv.

[79]  Matthias Rarey,et al.  On the Art of Compiling and Using 'Drug‐Like' Chemical Fragment Spaces , 2008, ChemMedChem.

[80]  Matthias Rarey,et al.  Exploring fragment spaces under multiple physicochemical constraints , 2007, J. Comput. Aided Mol. Des..

[81]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[82]  G. V. Paolini,et al.  Quantifying the chemical beauty of drugs. , 2012, Nature chemistry.

[83]  Matthias Rarey,et al.  Similarity searching in large combinatorial chemistry spaces , 2001, J. Comput. Aided Mol. Des..

[84]  Thomas Blaschke,et al.  Molecular de-novo design through deep reinforcement learning , 2017, Journal of Cheminformatics.

[85]  Eric J. Martin,et al.  In silico generation of novel, drug-like chemical matter using the LSTM neural network , 2017, ArXiv.