Molecular Hypergraph Grammar with its Application to Molecular Optimization

Molecular optimization aims to discover novel molecules with desirable properties. Two fundamental challenges are: (i) it is not trivial to generate valid molecules in a controllable way due to hard chemical constraints such as the valency conditions, and (ii) it is often costly to evaluate a property of a novel molecule, and therefore, the number of property evaluations is limited. These challenges are to some extent alleviated by a combination of a variational autoencoder (VAE) and Bayesian optimization (BO). VAE converts a molecule into/from its latent continuous vector, and BO optimizes a latent continuous vector (and its corresponding molecule) within a limited number of property evaluations. While the most recent work, for the first time, achieved 100% validity, its architecture is rather complex due to auxiliary neural networks other than VAE, making it difficult to train. This paper presents a molecular hypergraph grammar variational autoencoder (MHG-VAE), which uses a single VAE to achieve 100% validity. Our idea is to develop a graph grammar encoding the hard chemical constraints, called molecular hypergraph grammar (MHG), which guides VAE to always generate valid molecules. We also present an algorithm to construct MHG from a set of molecules.

[1]  Stanley F. Chen,et al.  Bayesian Grammar Induction for Language Modeling , 1995, ACL.

[2]  Jonas Mockus,et al.  On Bayesian Methods for Seeking the Extremum , 1974, Optimization Techniques.

[3]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[4]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[5]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[6]  William L Jorgensen,et al.  Efficient drug lead discovery and optimization. , 2009, Accounts of chemical research.

[7]  Ryan G. Coleman,et al.  ZINC: A Free Tool to Discover Chemistry for Biology , 2012, J. Chem. Inf. Model..

[8]  Li Li,et al.  Optimization of Molecules via Deep Reinforcement Learning , 2018, Scientific Reports.

[9]  Tim Weninger,et al.  Growing Graphs from Hyperedge Replacement Graph Grammars , 2016, CIKM.

[10]  Alán Aspuru-Guzik,et al.  Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models , 2017, ArXiv.

[11]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Jure Leskovec,et al.  Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation , 2018, NeurIPS.

[14]  Marwin H. S. Segler,et al.  GuacaMol: Benchmarking Models for De Novo Molecular Design , 2018, J. Chem. Inf. Model..

[15]  Steven Skiena,et al.  Syntax-Directed Variational Autoencoder for Structured Data , 2018, ICLR.

[16]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[17]  Lawrence B. Holder,et al.  Mdl-based context-free graph grammar induction and applications , 2004, Int. J. Artif. Intell. Tools.

[18]  Ryan P. Adams,et al.  Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. , 2016, Nature materials.

[19]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[20]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..