Generating Focussed Molecule Libraries for Drug Discovery with Recurrent Neural Networks

In de novo drug design, computational strategies are used to generate novel molecules with good affinity to the desired biological target. In this work, we show that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natural language processing. We demonstrate that the properties of the generated molecules correlate very well with the properties of the molecules used to train the model. In order to enrich libraries with molecules active towards a given biological target, we propose to fine-tune the model with small sets of molecules, which are known to be active against that target. Against Staphylococcus aureus, the model reproduced 14% of 6051 hold-out test molecules that medicinal chemists designed, whereas against Plasmodium falciparum (Malaria) it reproduced 28% of 1240 test molecules. When coupled with a scoring function, our model can perform the complete de novo drug design cycle to generate large sets of novel molecules for drug discovery.

[1]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[2]  Jürgen Schmidhuber,et al.  Biologically Plausible Speech Recognition with LSTM Neural Nets , 2004, BioADIT.

[3]  D. Pompliano,et al.  Nat. Rev. Drug Disc. , 2007 .

[4]  Jürgen Schmidhuber,et al.  Finding temporal structure in music: blues improvisation with LSTM recurrent networks , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  Sepp Hochreiter,et al.  Toxicity Prediction using Deep Learning , 2015, ArXiv.

[7]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[8]  Vijay S. Pande,et al.  Massively Multitask Networks for Drug Discovery , 2015, ArXiv.

[9]  Brian Goldman,et al.  Modeling Industrial ADMET Data with Multitask Networks , 2016, 1606.08793.

[10]  Peter Tino,et al.  IEEE Transactions on Neural Networks , 2009 .

[11]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[12]  Shinji Kusumoto,et al.  Biologically Inspired Approaches to Advanced Information Technology , 2004, Lecture Notes in Computer Science.

[13]  Gisbert Schneider,et al.  Molecular design . Concepts and applications , 2009 .

[14]  Vijay S. Pande,et al.  Low Data Drug Discovery with One-Shot Learning , 2016, ACS central science.

[15]  Carlos A.M. Fraga,et al.  Molecular Design: Concepts and Applications , 2008 .

[16]  Johann Gasteiger,et al.  Neural networks in chemistry and drug design , 1999 .

[17]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[18]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[19]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[20]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[23]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[24]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Alfredo Cesario,et al.  Current Pharmaceutical Design , 2016 .

[26]  Jürgen Schmidhuber,et al.  Transfer learning for Latin and Chinese characters with Deep Neural Networks , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[27]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[28]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[29]  Earl T. Barr,et al.  Learning Python Code Suggestion with a Sparse Pointer Network , 2016, ArXiv.

[30]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.