Population-based de novo molecule generation, using grammatical evolution

Automatic design with machine learning and molecular simulations has shown a remarkable ability to generate new and promising drug candidates. Current models, however, still have problems in simulation concurrency and molecular diversity. Most methods generate one molecule at a time and do not allow multiple simulators to run simultaneously. Additionally, better molecular diversity could boost the success rate in the subsequent drug discovery process. We propose a new population-based approach using grammatical evolution named ChemGE. In our method, a large population of molecules are updated concurrently and evaluated by multiple simulators in parallel. In docking experiments with thymidine kinase, ChemGE succeeded in generating hundreds of high-affinity molecules whose diversity is better than that of known inding molecules in DUD-E.

[1]  Michael M. Hann,et al.  RECAP — Retrosynthetic Combinatorial Analysis Procedure: A Powerful New Technique for Identifying Privileged Molecular Fragments with Useful Applications in Combinatorial Chemistry. , 1998 .

[2]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[3]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[4]  Olexandr Isayev,et al.  Deep reinforcement learning for de novo drug design , 2017, Science Advances.

[5]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[6]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[7]  Kenta Hongo,et al.  Bayesian molecular design with a chemical language model , 2017, Journal of Computer-Aided Molecular Design.

[8]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[9]  Michael M. Mysinger,et al.  Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking , 2012, Journal of medicinal chemistry.

[10]  Xavier Barril,et al.  rDock: A Fast, Versatile and Open Source Program for Docking Ligands to Proteins and Nucleic Acids , 2014, PLoS Comput. Biol..

[11]  Mostapha Benhenda,et al.  ChemGAN challenge for drug discovery: can AI reproduce natural chemical diversity? , 2017, ArXiv.

[12]  Ryan G. Coleman,et al.  ZINC: A Free Tool to Discover Chemistry for Biology , 2012, J. Chem. Inf. Model..

[13]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[14]  Anne Auger,et al.  Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles , 2011, J. Mach. Learn. Res..

[15]  Anthony Brabazon,et al.  Foundations in Grammatical Evolution for Dynamic Environments , 2009, Studies in Computational Intelligence.

[16]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[17]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[18]  Alán Aspuru-Guzik,et al.  Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models , 2017, ArXiv.

[19]  Peter Ertl,et al.  Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions , 2009, J. Cheminformatics.

[20]  Ingo Rechenberg,et al.  Case studies in evolutionary experimentation and computation , 2000 .

[21]  Koji Tsuda,et al.  ChemTS: an efficient python library for de novo molecular generation , 2017, Science and technology of advanced materials.