Beyond generative models: superfast traversal, optimization, novelty, exploration and discovery (STONED) algorithm for molecules using SELFIES

Inverse design allows the generation of molecules with desirable physical quantities using property optimization. Deep generative models have recently been applied to tackle inverse design, as they possess the ability to optimize molecular properties directly through structure modification using gradients. While the ability to carry out direct property optimizations is promising, the use of generative deep learning models to solve practical problems requires large amounts of data and is very time-consuming. In this work, we propose STONED – a simple and efficient algorithm to perform interpolation and exploration in the chemical space, comparable to deep generative models. STONED bypasses the need for large amounts of data and training times by using string modifications in the SELFIES molecular representation. First, we achieve non-trivial performance on typical benchmarks for generative models without any training. Additionally, we demonstrate applications in high-throughput virtual screening for the design of drugs, photovoltaics, and the construction of chemical paths, allowing for both property and structure-based interpolation in the chemical space. Overall, we anticipate our results to be a stepping stone for developing more sophisticated inverse design models and benchmarking tools, ultimately helping generative models achieve wider adoption.

[1]  Juyong Lee,et al.  MolFinder: An Efficient Global Molecular Property Optimization and Search Algorithm Using SMILES , 2020 .

[2]  Alán Aspuru-Guzik,et al.  Designing and understanding light-harvesting devices with machine learning , 2020, Nature Communications.

[3]  Gabriel dos Passos Gomes,et al.  Navigating through the Maze of Homogeneous Catalyst Design with Machine Learning , 2020, Trends in Chemistry.

[4]  Andrew E. Brereton,et al.  Assessing methods and obstacles in chemical space exploration , 2020, Applied AI Letters.

[5]  Jonas Verhellen,et al.  Illuminating elite patches of chemical space† , 2020, Chemical science.

[6]  Stanislaw Jastrzebski,et al.  Generative Models Should at Least Be Able to Design Molecules That Dock Well: A New Benchmark , 2020, J. Chem. Inf. Model..

[7]  Jan H. Jensen,et al.  Chemical space exploration: how genetic algorithms find the needle in the haystack , 2020, PeerJ Physical Chemistry.

[8]  G. Graziano Fingerprints of molecular reactivity , 2020, Nature Reviews Chemistry.

[9]  P. Polishchuk CReM: chemically reasonable mutations framework for structure generation , 2020, Journal of Cheminformatics.

[10]  Sepp Hochreiter,et al.  On failure modes in molecule generation and optimization. , 2019, Drug discovery today. Technologies.

[11]  Mario Krenn,et al.  Advances in high-dimensional quantum entanglement , 2019, 1911.10006.

[12]  Mario Krenn,et al.  Computer-Inspired Concept for High-Dimensional Multipartite Quantum Gates. , 2019, Physical review letters.

[13]  Alán Aspuru-Guzik,et al.  Augmenting Genetic Algorithms with Deep Neural Networks for Exploring the Chemical Space , 2019, ICLR.

[14]  Alán Aspuru-Guzik,et al.  Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models , 2018, Frontiers in Pharmacology.

[15]  Marwin H. S. Segler,et al.  GuacaMol: Benchmarking Models for De Novo Molecular Design , 2018, J. Chem. Inf. Model..

[16]  C. Bannwarth,et al.  GFN2-xTB-An Accurate and Broadly Parametrized Self-Consistent Tight-Binding Quantum Chemical Method with Multipole Electrostatics and Density-Dependent Dispersion Contributions. , 2018, Journal of chemical theory and computation.

[17]  Jan H. Jensen,et al.  A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space , 2018, Chemical science.

[18]  Noel M. O'Boyle,et al.  DeepSMILES: An Adaptation of SMILES for Use in Machine-Learning of Chemical Structures , 2018 .

[19]  A. Poso,et al.  Binding Affinity via Docking: Fact and Fiction , 2018, Molecules.

[20]  Alán Aspuru-Guzik,et al.  Inverse molecular design using machine learning: Generative models for matter engineering , 2018, Science.

[21]  Jure Leskovec,et al.  Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation , 2018, NeurIPS.

[22]  Nicola De Cao,et al.  MolGAN: An implicit generative model for small molecular graphs , 2018, ArXiv.

[23]  Koji Tsuda,et al.  Population-based de novo molecule generation, using grammatical evolution , 2018, 1804.02134.

[24]  Mohamed Ahmed,et al.  Exploring Deep Recurrent Models with Reinforcement Learning for Molecule Design , 2018, ICLR.

[25]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[26]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[27]  Michael J. Keiser,et al.  A simple representation of three-dimensional molecular structure , 2017, bioRxiv.

[28]  Jon Atli Benediktsson,et al.  Automatic selection of molecular descriptors using random forest: Application to drug discovery , 2017, Expert Syst. Appl..

[29]  George Papadatos,et al.  The ChEMBL database in 2017 , 2016, Nucleic Acids Res..

[30]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[31]  A. Zeilinger,et al.  Twisted photon entanglement through turbulent air across Vienna , 2015, Proceedings of the National Academy of Sciences.

[32]  Alán Aspuru-Guzik,et al.  What Is High-Throughput Virtual Screening? A Perspective from Organic Materials Discovery , 2015 .

[33]  George Papadatos,et al.  ChEMBL web services: streamlining access to drug discovery data and utilities , 2015, Nucleic Acids Res..

[34]  S. Siva Sathya,et al.  Evolutionary algorithms for de novo drug design - A survey , 2015, Appl. Soft Comput..

[35]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[36]  Daniel Svozil,et al.  Molpher: a software framework for systematic chemical space exploration , 2014, Journal of Cheminformatics.

[37]  A. Zeilinger,et al.  Communication with spatially modulated light through turbulent air across Vienna , 2014, 1402.2602.

[38]  Hualiang Jiang,et al.  Structural Basis for Molecular Recognition at Serotonin Receptors , 2013, Science.

[39]  David Ryan Koes,et al.  Lessons Learned in Empirical Scoring with smina from the CSAR 2011 Benchmarking Exercise , 2013, J. Chem. Inf. Model..

[40]  Eric F. Johnson,et al.  Crystal Structure of Human Cytochrome P450 2D6 with Prinomastat Bound* , 2012, The Journal of Biological Chemistry.

[41]  G. V. Paolini,et al.  Quantifying the chemical beauty of drugs. , 2012, Nature chemistry.

[42]  Krishna Rajan,et al.  Combinatorial and high-throughput screening of materials libraries: review of state of the art. , 2011, ACS combinatorial science.

[43]  Alán Aspuru-Guzik,et al.  The Harvard Clean Energy Project: Large-Scale Computational Screening and Design of Organic Photovoltaics on the World Community Grid , 2011 .

[44]  Aurélien Grosdidier,et al.  Docking, virtual high throughput screening and in silico fragment-based drug design , 2009, Journal of cellular and molecular medicine.

[45]  J. Reymond,et al.  Chemical Space Travel , 2007, ChemMedChem.

[46]  Jürgen Bajorath,et al.  Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches. , 2007, Drug discovery today.

[47]  Johann Gasteiger,et al.  The de novo design of median molecules within a property range of interest , 2004, J. Comput. Aided Mol. Des..

[48]  R. Glen,et al.  Molecular similarity: a key technique in molecular informatics. , 2004, Organic & biomolecular chemistry.

[49]  Johann Gasteiger,et al.  A Graph-Based Genetic Algorithm and Its Application to the Multiobjective Evolution of Median Molecules , 2004, J. Chem. Inf. Model..

[50]  Horst Bunke,et al.  On Median Graphs: Properties, Algorithms, and Applications , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Dominique Douguet,et al.  A genetic algorithm for the automated generation of small organic molecules: Drug design using an evolutionary algorithm , 2000, J. Comput. Aided Mol. Des..

[52]  K. Goa,et al.  Celecoxib , 2000, Drugs.

[53]  Gordon M. Crippen,et al.  Prediction of Physicochemical Parameters by Atomic Contributions , 1999, J. Chem. Inf. Comput. Sci..

[54]  Robert C. Glen,et al.  A genetic algorithm for the automated generation of molecules within constraints , 1995, J. Comput. Aided Mol. Des..

[55]  Barry Robson,et al.  PRO_LIGAND: An approach to de novo molecular design. 3. A genetic algorithm for structure refinement , 1995, J. Comput. Aided Mol. Des..

[56]  S. P. Fodor,et al.  Applications of combinatorial technologies to drug discovery. 2. Combinatorial organic synthesis, library screening strategies, and future directions. , 1994, Journal of medicinal chemistry.

[57]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[58]  L. Teh,et al.  Pharmacogenomics of CYP2D6: molecular genetics, interethnic differences and clinical importance. , 2012, Drug metabolism and pharmacokinetics.

[59]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[60]  J. Bredt,et al.  Über sterische Hinderung in Brückenringen (Bredtsche Regel) und über die meso‐trans‐Stellung in kondensierten Ringsystemen des Hexamethylens , 1924 .