论文信息 - Population-based de novo molecule generation, using grammatical evolution

Population-based de novo molecule generation, using grammatical evolution

Automatic design with machine learning and molecular simulations has shown a remarkable ability to generate new and promising drug candidates. Current models, however, still have problems in simulation concurrency and molecular diversity. Most methods generate one molecule at a time and do not allow multiple simulators to run simultaneously. Additionally, better molecular diversity could boost the success rate in the subsequent drug discovery process. We propose a new population-based approach using grammatical evolution named ChemGE. In our method, a large population of molecules are updated concurrently and evaluated by multiple simulators in parallel. In docking experiments with thymidine kinase, ChemGE succeeded in generating hundreds of high-affinity molecules whose diversity is better than that of known inding molecules in DUD-E.

[1] Michael M. Hann,et al. RECAP — Retrosynthetic Combinatorial Analysis Procedure: A Powerful New Technique for Identifying Privileged Molecular Fragments with Useful Applications in Combinatorial Chemistry. , 1998 .

[2] Alán Aspuru-Guzik,et al. Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[3] J. Tenenbaum,et al. A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[4] Olexandr Isayev,et al. Deep reinforcement learning for de novo drug design , 2017, Science Advances.

[5] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[6] Matt J. Kusner,et al. Grammar Variational Autoencoder , 2017, ICML.

[7] Kenta Hongo,et al. Bayesian molecular design with a chemical language model , 2017, Journal of Computer-Aided Molecular Design.

[8] Max Jaderberg,et al. Population Based Training of Neural Networks , 2017, ArXiv.

[9] Michael M. Mysinger,et al. Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking , 2012, Journal of medicinal chemistry.

[10] Xavier Barril,et al. rDock: A Fast, Versatile and Open Source Program for Docking Ligands to Proteins and Nucleic Acids , 2014, PLoS Comput. Biol..

[11] Mostapha Benhenda,et al. ChemGAN challenge for drug discovery: can AI reproduce natural chemical diversity? , 2017, ArXiv.

[12] Ryan G. Coleman,et al. ZINC: A Free Tool to Discover Chemistry for Biology , 2012, J. Chem. Inf. Model..

[13] David Rogers,et al. Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[14] Anne Auger,et al. Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles , 2011, J. Mach. Learn. Res..

[15] Anthony Brabazon,et al. Foundations in Grammatical Evolution for Dynamic Environments , 2009, Studies in Computational Intelligence.

[16] Thierry Kogej,et al. Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[17] David Weininger,et al. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[18] Alán Aspuru-Guzik,et al. Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models , 2017, ArXiv.

[19] Peter Ertl,et al. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions , 2009, J. Cheminformatics.

[20] Ingo Rechenberg,et al. Case studies in evolutionary experimentation and computation , 2000 .

[21] Koji Tsuda,et al. ChemTS: an efficient python library for de novo molecular generation , 2017, Science and technology of advanced materials.