Conditional Constrained Graph Variational Autoencoders for Molecule Design

In recent years, deep generative models for graphs have been used to generate new molecules. These models have produced good results, leading to several proposals in the literature. However, these models may have troubles learning some of the complex laws governing the chemical world. In this work, we explore the usage of the histogram of atom valences to drive the generation of molecules in such models. We present Conditional Constrained Graph Variational Autoencoder (CCGVAE), a model that implements this key-idea in a state-of-the-art model, and shows improved results on several evaluation metrics on two commonly adopted datasets for molecule generation.

[1]  Marco Buongiorno Nardelli,et al.  The high-throughput highway to computational materials design. , 2013, Nature materials.

[2]  A. Micheli,et al.  A Novel Approach to QSPR/QSAR Based on Neural Networks for Structures , 2003 .

[3]  Alán Aspuru-Guzik,et al.  What Is High-Throughput Virtual Screening? A Perspective from Organic Materials Discovery , 2015 .

[4]  David Weininger,et al.  SMILES, 3. DEPICT. Graphical depiction of chemical structures , 1990, J. Chem. Inf. Comput. Sci..

[5]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[6]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[7]  Alán Aspuru-Guzik,et al.  Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models , 2017, ArXiv.

[8]  Marwin H. S. Segler,et al.  GuacaMol: Benchmarking Models for De Novo Molecular Design , 2018, J. Chem. Inf. Model..

[9]  Alán Aspuru-Guzik,et al.  Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models , 2018, Frontiers in Pharmacology.

[10]  S. Siva Sathya,et al.  Evolutionary algorithms for de novo drug design - A survey , 2015, Appl. Soft Comput..

[11]  Alessio Micheli,et al.  Predicting Physical-Chemical Properties of Compounds from Molecular Structures by Recursive Neural Networks , 2006, J. Chem. Inf. Model..

[12]  Donald E. Knuth,et al.  Semantics of context-free languages , 1968, Mathematical systems theory.

[13]  Sepp Hochreiter,et al.  Fréchet ChemNet Distance: A Metric for Generative Models for Molecules in Drug Discovery , 2018, J. Chem. Inf. Model..

[14]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[15]  Peter Ertl,et al.  Natural Product-likeness Score and Its Application for Prioritization of Compound Libraries , 2008, J. Chem. Inf. Model..

[16]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[17]  Steven Skiena,et al.  Syntax-Directed Variational Autoencoder for Structured Data , 2018, ICLR.

[18]  Jean-Louis Reymond,et al.  Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17 , 2012, J. Chem. Inf. Model..

[19]  Qi Liu,et al.  Constrained Graph Variational Autoencoders for Molecule Design , 2018, NeurIPS.

[20]  Cao Xiao,et al.  Constrained Generation of Semantically Valid Graphs via Regularizing Variational Autoencoders , 2018, NeurIPS.

[21]  Alessandro Sperduti,et al.  A Systematic Assessment of Deep Learning Models for Molecule Generation , 2020, ESANN.

[22]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[23]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[24]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[25]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[26]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[27]  Roman Garnett,et al.  Active Search for Computer‐aided Drug Design , 2018, Molecular informatics.

[28]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[29]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[30]  Nicola De Cao,et al.  MolGAN: An implicit generative model for small molecular graphs , 2018, ArXiv.

[31]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.