Data-Driven Approach to Encoding and Decoding 3-D Crystal Structures

Generative models have achieved impressive results in many domains including image and text generation. In the natural sciences, generative models have led to rapid progress in automated drug discovery. Many of the current methods focus on either 1-D or 2-D representations of typically small, drug-like molecules. However, many molecules require 3-D descriptors and exceed the chemical complexity of commonly used dataset. We present a method to encode and decode the position of atoms in 3-D molecules from a dataset of nearly 50,000 stable crystal unit cells that vary from containing 1 to over 100 atoms. We construct a smooth and continuous 3-D density representation of each crystal based on the positions of different atoms. Two different neural networks were trained on a dataset of over 120,000 three-dimensional samples of single and repeating crystal structures, made by rotating the single unit cells. The first, an Encoder-Decoder pair, constructs a compressed latent space representation of each molecule and then decodes this description into an accurate reconstruction of the input. The second network segments the resulting output into atoms and assigns each atom an atomic number. By generating compressed, continuous latent spaces representations of molecules we are able to decode random samples, interpolate between two molecules, and alter known molecules.

[1]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[2]  C. Macrae,et al.  Mercury CSD 2.0 – new features for the visualization and investigation of crystal structures , 2008 .

[3]  Ryan G. Coleman,et al.  ZINC: A Free Tool to Discover Chemistry for Biology , 2012, J. Chem. Inf. Model..

[4]  Anubhav Jain,et al.  Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis , 2012 .

[5]  Alexandre Varnek,et al.  Estimation of the size of drug-like chemical space based on GDB-17 data , 2013, Journal of Computer-Aided Molecular Design.

[6]  Kristin A. Persson,et al.  Commentary: The Materials Project: A materials genome approach to accelerating materials innovation , 2013 .

[7]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[8]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[9]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[10]  Guangda Niu,et al.  Review of recent progress in chemical stability of perovskite solar cells , 2015 .

[11]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[12]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[13]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[16]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[17]  Theodore Lim,et al.  Generative and Discriminative Voxel Modeling with Convolutional Neural Networks , 2016, ArXiv.

[18]  Vincent Dumoulin,et al.  Deconvolution and Checkerboard Artifacts , 2016 .

[19]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[20]  Koji Tsuda,et al.  ChemTS: an efficient python library for de novo molecular generation , 2017, Science and technology of advanced materials.

[21]  Hyunsoo Kim,et al.  Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[22]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[23]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[24]  Jiajun Wu,et al.  Visual Object Networks: Image Generation with Disentangled 3D Representations , 2018, NeurIPS.

[25]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[26]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[27]  Alán Aspuru-Guzik,et al.  Accelerating the discovery of materials for clean energy in the era of smart automation , 2018, Nature Reviews Materials.

[28]  Max Welling,et al.  3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data , 2018, NeurIPS.

[29]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[30]  Loïc Le Folgoc,et al.  Attention U-Net: Learning Where to Look for the Pancreas , 2018, ArXiv.

[31]  Leonidas J. Guibas,et al.  Learning Representations and Generative Models for 3D Point Clouds , 2017, ICML.

[32]  Jeffrey C Grossman,et al.  Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. , 2017, Physical review letters.

[33]  K-R Müller,et al.  SchNet - A deep learning architecture for molecules and materials. , 2017, The Journal of chemical physics.

[34]  Yoshua Bengio,et al.  DEFactor: Differentiable Edge Factorization-based Probabilistic Graph Generation , 2018, ArXiv.

[35]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[36]  Ming-Yu Liu,et al.  PointFlow: 3D Point Cloud Generation With Continuous Normalizing Flows , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  Nataliya Sokolovska,et al.  CrystalGAN: Learning to Discover Crystallographic Structures with Generative Adversarial Networks , 2018, AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering.

[38]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[39]  Enrico Magli,et al.  Learning Localized Generative Models for 3D Point Clouds via Graph Convolution , 2018, ICLR.

[40]  Ali Razavi,et al.  Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.

[41]  Elman Mansimov,et al.  Molecular Geometry Prediction using a Deep Generative Graph Neural Network , 2019, Scientific Reports.

[42]  Yong-Liang Yang,et al.  HoloGAN: Unsupervised Learning of 3D Representations From Natural Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[43]  Yoshua Bengio,et al.  Tackling Climate Change with Machine Learning , 2019, ACM Comput. Surv..

[44]  Michael Gastegger,et al.  Symmetry-adapted generation of 3d point sets for the targeted discovery of molecules , 2019, NeurIPS.

[45]  Rafael Gómez-Bombarelli,et al.  Generative Models for Automatic Chemical Design , 2019, Machine Learning Meets Quantum Physics.

[46]  Kaushalya Madhawa,et al.  GraphNVP: an Invertible Flow-based Model for Generating Molecular Graphs , 2019 .

[47]  Chi Chen,et al.  Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals , 2018, Chemistry of Materials.

[48]  Motoki Abe,et al.  GraphNVP: An Invertible Flow Model for Generating Molecular Graphs , 2019, ArXiv.

[49]  Ekin D Cubuk,et al.  Screening billions of candidates for solid lithium-ion conductors: A transfer learning approach for small data. , 2019, The Journal of chemical physics.