Actively Searching: Inverse Design of Novel Molecules with Simultaneously Optimized Properties.

Combining quantum chemistry characterizations with generative machine learning models has the potential to accelerate molecular searches in chemical space. In this paradigm, quantum chemistry acts as a relatively cost-effective oracle for evaluating the properties of particular molecules while generative models provide a means of sampling chemical space based on learned structure-function relationships. For practical applications, multiple potentially orthogonal properties must be optimized in tandem during a discovery workflow. This carries additional difficulties associated with specificity of the targets and the ability for the model to reconcile all properties simultaneously. Here we demonstrate an active learning approach to improve the performance of multi-target generative chemical models. We first demonstrate the effectiveness of a set of baseline models trained on single property prediction tasks in generating novel compounds with various property targets, including both interpolative and extrapolative generation scenarios. For property ranges where accurate targeting proves difficult, the novel compounds suggested by the model are characterized using quantum chemistry to obtain the true values, and these new molecules closest to expressing the desired properties are fed back into the generative model for additional training. This gradually improves the generative models’ understanding of unknown areas of chemical space and shifts the distribution of generated compounds towards the targeted values. We then demonstrate the effectiveness of this active learning approach in generating compounds with multiple chemical constraints, including vertical ionization potential, electron affinity, and dipole moment targets, and validate the results at the B97X-D3/def2-TZVP level. This method requires no modifications to extant generative approaches, but rather utilizes their inherent generative and predictive aspects for self-refinement, and can be applied to situations where any number of properties with varying degrees of correlation must be optimized simultaneously.

[1]  Nikos Komodakis,et al.  GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders , 2018, ICANN.

[2]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[3]  George E. Dahl,et al.  Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error. , 2017, Journal of chemical theory and computation.

[4]  Jonas Boström,et al.  Deep Reinforcement Learning for Multiparameter Optimization in de novo Drug Design , 2019, J. Chem. Inf. Model..

[5]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[6]  Nicola De Cao,et al.  MolGAN: An implicit generative model for small molecular graphs , 2018, ArXiv.

[7]  Jin Woo Kim,et al.  Molecular generative model based on conditional variational autoencoder for de novo molecular design , 2018, Journal of Cheminformatics.

[8]  Robert Abel,et al.  Reaction-Based Enumeration, Active Learning, and Free Energy Calculations To Rapidly Explore Synthetically Tractable Chemical Space and Optimize Potency of Cyclin-Dependent Kinase 2 Inhibitors , 2019, J. Chem. Inf. Model..

[9]  Chenru Duan,et al.  Accurate Multiobjective Design in a Space of Millions of Transition Metal Complexes with Neural-Network-Driven Efficient Global Optimization , 2020, ACS central science.

[10]  Bo Lu,et al.  Image-based manufacturing analytics: Improving the accuracy of an industrial pellet classification system using deep neural networks , 2018, Chemometrics and Intelligent Laboratory Systems.

[11]  Nicolae C. Iovanac,et al.  Simpler is Better: How Linear Prediction Tasks Improve Transfer Learning in Chemical Autoencoders. , 2020, The journal of physical chemistry. A.

[12]  Mikkel N. Schmidt,et al.  Machine learning-based screening of complex molecules for polymer solar cells. , 2018, The Journal of chemical physics.

[13]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[14]  Alán Aspuru-Guzik,et al.  Optimizing distributions over molecular space. An Objective-Reinforced Generative Adversarial Network for Inverse-design Chemistry (ORGANIC) , 2017 .

[15]  Andrey Kazennov,et al.  The cornucopia of meaningful leads: Applying deep adversarial autoencoders for new molecule development in oncology , 2016, Oncotarget.

[16]  Li Li,et al.  Optimization of Molecules via Deep Reinforcement Learning , 2018, Scientific Reports.

[17]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[18]  V. Barone,et al.  Toward reliable density functional methods without adjustable parameters: The PBE0 model , 1999 .

[19]  Stefan Grimme,et al.  GFN2-xTB-An Accurate and Broadly Parametrized Self-Consistent Tight-Binding Quantum Chemical Method with Multipole Electrostatics and Density-Dependent Dispersion Contributions. , 2018, Journal of Chemical Theory and Computation.

[20]  Dragos Horvath,et al.  De Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping , 2019, J. Chem. Inf. Model..

[21]  S. Pinho,et al.  Application of machine learning to predict the multiaxial strain-sensing response of CNT-polymer composites , 2019, Carbon.

[22]  Stephen Wu,et al.  Machine-learning-assisted discovery of polymers with high thermal conductivity using a molecular design algorithm , 2019, npj Computational Materials.

[23]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[24]  Jeffrey C Grossman,et al.  Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. , 2017, Physical review letters.

[25]  Seongok Ryu,et al.  Molecular Generative Model Based On Adversarially Regularized Autoencoder , 2019, J. Chem. Inf. Model..

[26]  Sergey Nikolenko,et al.  druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico. , 2017, Molecular pharmaceutics.

[27]  Beyond generative models: superfast traversal, optimization, novelty, exploration and discovery (STONED) algorithm for molecules using SELFIES† , 2021, Chemical science.

[28]  Brett M. Savoie,et al.  Improving the generative performance of chemical autoencoders through transfer learning , 2020, Mach. Learn. Sci. Technol..

[29]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[30]  Robert Abel,et al.  Combining Cloud-Based Free-Energy Calculations, Synthetically Aware Enumerations, and Goal-Directed Generative Machine Learning for Rapid Large-Scale Chemical Exploration and Optimization , 2020, J. Chem. Inf. Model..

[31]  Safwan Altarazi,et al.  Machine Learning Models for Predicting and Classifying the Tensile Strength of Polymeric Films Fabricated via Different Production Processes , 2019, Materials.

[32]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[33]  Ryan-Rhys Griffiths,et al.  Constrained Bayesian optimization for automatic chemical design using variational autoencoders , 2019, Chemical science.

[34]  Dmitry Vetrov,et al.  Entangled Conditional Adversarial Autoencoder for de Novo Drug Discovery. , 2018, Molecular pharmaceutics.

[35]  Thomas Blaschke,et al.  Application of Generative Autoencoder in De Novo Molecular Design , 2017, Molecular informatics.