Junction Tree Variational Autoencoder for Molecular Graph Generation

We seek to automate the design of molecules based on specific chemical properties. In computational terms, this task involves continuous embedding and generation of molecular graphs. Our primary contribution is the direct realization of molecular graphs, a task previously approached by generating linear SMILES strings instead of graphs. Our junction tree variational autoencoder generates molecular graphs in two phases, by first generating a tree-structured scaffold over chemical substructures, and then combining them into a molecule with a graph message passing network. This approach allows us to incrementally expand molecules while maintaining chemical validity at every step. We evaluate our model on multiple tasks ranging from molecular generation to optimization. Across these tasks, our model outperforms previous state-of-the-art baselines by a significant margin.

[1]  A. Roche,et al.  Organic Chemistry: , 1982, Nature.

[2]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[3]  Matthias Rarey,et al.  Feature trees: A new molecular similarity measure based on tree matching , 1998, J. Comput. Aided Mol. Des..

[4]  F. Scarselli,et al.  A new model for learning in graph domains , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[5]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[6]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[7]  Maria F. Sassano,et al.  Automated design of ligands to polypharmacological profiles , 2012, Nature.

[8]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[9]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[10]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[11]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[12]  Hongyu Guo,et al.  Long Short-Term Memory Over Recursive Structures , 2015, ICML.

[13]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[14]  Geoffrey E. Hinton,et al.  Grammar as a Foreign Language , 2014, NIPS.

[15]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[16]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[17]  Joan Bruna,et al.  Deep Convolutional Networks on Graph-Structured Data , 2015, ArXiv.

[18]  Le Song,et al.  Discriminative Embeddings of Latent Variable Models for Structured Data , 2016, ICML.

[19]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[20]  Mirella Lapata,et al.  Language to Logical Form with Neural Attention , 2016, ACL.

[21]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[22]  Noah A. Smith,et al.  Recurrent Neural Network Grammars , 2016, NAACL.

[23]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[24]  Eliyahu Kiperwasser,et al.  Easy-First Dependency Parsing with Hierarchical Tree LSTMs , 2016, TACL.

[25]  Alán Aspuru-Guzik,et al.  Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models , 2017, ArXiv.

[26]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[27]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[28]  Klaus-Robert Müller,et al.  SchNet: A continuous-filter convolutional neural network for modeling quantum interactions , 2017, NIPS.

[29]  Lihong Li,et al.  Neuro-Symbolic Program Synthesis , 2016, ICLR.

[30]  Tommi S. Jaakkola,et al.  Tree-structured decoding with doubly-recurrent neural networks , 2016, ICLR.

[31]  Regina Barzilay,et al.  Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network , 2017, NIPS.

[32]  Thierry Kogej,et al.  Generating Focussed Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ArXiv.

[33]  David Janz,et al.  Actively Learning what makes a Discrete Sequence Valid , 2017, ArXiv.

[34]  Tommi S. Jaakkola,et al.  Sequence to Better Sequence: Continuous Revision of Combinatorial Structures , 2017, ICML.

[35]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[36]  Regina Barzilay,et al.  Deriving Neural Architectures from Sequence and Graph Kernels , 2017, ICML.

[37]  Jonathan Masci,et al.  Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Yoav Goldberg,et al.  Towards String-To-Tree Neural Machine Translation , 2017, ACL.

[39]  Steven Skiena,et al.  Syntax-Directed Variational Autoencoder for Structured Data , 2018, ICLR.

[40]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[41]  Nikos Komodakis,et al.  GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders , 2018, ICANN.

[42]  Razvan Pascanu,et al.  Learning Deep Generative Models of Graphs , 2018, ICLR 2018.