Generative Chemical Transformer: Neural Machine Learning of Molecular Geometric Structures from Chemical Language via Attention

Discovering new materials better suited to specific purposes is an important issue in improving the quality of human life. Here, a neural network that creates molecules that meet some desired multiple target conditions based on a deep understanding of chemical language is proposed (generative chemical Transformer, GCT). The attention mechanism in GCT allows a deeper understanding of molecular structures beyond the limitations of chemical language itself which cause semantic discontinuity by paying attention to characters sparsely. The significance of language models for inverse molecular design problems is investigated by quantitatively evaluating the quality of the generated molecules. GCT generates highly realistic chemical strings that satisfy both chemical and linguistic grammar rules. Molecules parsed from the generated strings simultaneously satisfy the multiple target properties and vary for a single condition set. These advances will contribute to improving the quality of human life by accelerating the process of desired material discovery.

[1]  Silvia Amabilino,et al.  Guidelines for RNN Transfer Learning Based Molecular Generation of Focussed Libraries. , 2020, Journal of chemical information and modeling.

[2]  Graham Neubig,et al.  Lagging Inference Networks and Posterior Collapse in Variational Autoencoders , 2019, ICLR.

[3]  Yongjin Lee,et al.  Machine Learning Enabled Tailor-Made Design of Application-Specific Metal-Organic Frameworks. , 2019, ACS applied materials & interfaces.

[4]  Jesse Vig,et al.  A Multiscale Visualization of Attention in the Transformer Model , 2019, ACL.

[5]  Thomas Blaschke,et al.  Molecular de-novo design through deep reinforcement learning , 2017, Journal of Cheminformatics.

[6]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[7]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[8]  Frank Noé,et al.  Efficient multi-objective molecular optimization in a continuous latent space† †Electronic supplementary information (ESI) available: Details of the desirability scaling functions, high resolution figures and detailed results of the GuacaMol benchmark. See DOI: 10.1039/c9sc01928f , 2019, Chemical science.

[9]  Ola Engkvist,et al.  A de novo molecular generation method using latent vector based generative adversarial network , 2019, J. Cheminformatics.

[10]  Maciej Haranczyk,et al.  Capturing chemical intuition in synthesis of metal-organic frameworks , 2019, Nature Communications.

[11]  Thomas Blaschke,et al.  Application of Generative Autoencoder in De Novo Molecular Design , 2017, Molecular informatics.

[12]  Xiaodong Liu,et al.  Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing , 2019, NAACL.

[13]  Liwei Wang,et al.  On Layer Normalization in the Transformer Architecture , 2020, ICML.

[14]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[15]  Ola Engkvist,et al.  Direct steering of de novo molecular generation with descriptor conditional recurrent neural networks , 2020, Nature Machine Intelligence.

[16]  P. Wipf,et al.  Stochastic voyages into uncharted chemical space produce a representative library of all possible drug-like compounds. , 2013, Journal of the American Chemical Society.

[17]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[18]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[19]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[20]  Alán Aspuru-Guzik,et al.  Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models , 2018, Frontiers in Pharmacology.

[21]  Matthias Rarey,et al.  On the Art of Compiling and Using 'Drug‐Like' Chemical Fragment Spaces , 2008, ChemMedChem.

[22]  Heather J Kulik,et al.  Accelerating Chemical Discovery with Machine Learning: Simulated Evolution of Spin Crossover Complexes with an Artificial Neural Network. , 2018, The journal of physical chemistry letters.

[23]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[24]  Stephen R. Heller,et al.  InChI, the IUPAC International Chemical Identifier , 2015, Journal of Cheminformatics.

[25]  David Weininger,et al.  SMILES, 3. DEPICT. Graphical depiction of chemical structures , 1990, J. Chem. Inf. Comput. Sci..

[26]  Mike Preuss,et al.  Planning chemical syntheses with deep neural networks and symbolic AI , 2017, Nature.

[27]  Kaifu Gao,et al.  Generative Network Complex for the Automated Generation of Drug-like Molecules , 2020, J. Chem. Inf. Model..

[28]  David A. C. Beck,et al.  Attention-based generative models for de novo molecular design , 2021, Chemical science.

[29]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[30]  Koji Tsuda,et al.  ChemTS: an efficient python library for de novo molecular generation , 2017, Science and technology of advanced materials.

[31]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[32]  Ryan-Rhys Griffiths,et al.  Constrained Bayesian optimization for automatic chemical design using variational autoencoders , 2019, Chemical science.

[33]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[34]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[35]  G. Bemis,et al.  The properties of known drugs. 1. Molecular frameworks. , 1996, Journal of medicinal chemistry.

[36]  D. Fourches,et al.  SMILES Pair Encoding: A Data-Driven Substructure Tokenization Algorithm for Deep Learning , 2020, J. Chem. Inf. Model..

[37]  Ola Engkvist,et al.  Randomized SMILES strings improve the quality of molecular generative models , 2019, Journal of Cheminformatics.

[38]  Sepp Hochreiter,et al.  Fréchet ChemNet Distance: A Metric for Generative Models for Molecules in Drug Discovery , 2018, J. Chem. Inf. Model..

[39]  G. V. Paolini,et al.  Quantifying the chemical beauty of drugs. , 2012, Nature chemistry.

[40]  Andrey Kazennov,et al.  The cornucopia of meaningful leads: Applying deep adversarial autoencoders for new molecule development in oncology , 2016, Oncotarget.

[41]  Alán Aspuru-Guzik,et al.  Augmenting Genetic Algorithms with Deep Neural Networks for Exploring the Chemical Space , 2020, ICLR.

[42]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.