Model-based reinforcement learning for biological sequence design

The ability to design biological structures such as DNA or proteins would have considerable medical and industrial impact. Doing so presents a challenging black-box optimization problem characterized by the large-batch, low round setting due to the need for labor-intensive wet lab evaluations. In response, we propose using reinforcement learning (RL) based on proximal-policy optimization (PPO) for biological sequence design. RL provides a flexible framework for optimization generative sequence models to achieve specific criteria, such as diversity among the high-quality sequences discovered. We propose a model-based variant of PPO, DyNA-PPO, to improve sample efficiency, where the policy for a new round is trained offline using a simulator fit on functional measurements from prior rounds. To accommodate the growing number of observations across rounds, the simulator model is automatically selected at each round from a pool of diverse models of varying capacity. On the tasks of designing DNA transcription factor binding sites, designing antimicrobial proteins, and optimizing the energy of Ising models based on protein structure, we find that DyNA-PPO performs significantly better than existing methods in settings in which modeling is feasible, while still not performing worse in situations in which a reliable model cannot be learned.

[1]  Zachary Wu,et al.  Machine learning-assisted directed protein evolution with combinatorial libraries , 2019, Proceedings of the National Academy of Sciences.

[2]  Regina Barzilay,et al.  Learning Multimodal Graph-to-Graph Translation for Molecular Optimization , 2018, ICLR.

[3]  Brendan J. Frey,et al.  Generating and designing DNA with deep generative models , 2017, ArXiv.

[4]  Georg Seelig,et al.  Human 5′ UTR design and variant effect prediction from a massively parallel translation assay , 2018, Nature Biotechnology.

[5]  Samy Bengio,et al.  Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[6]  Li Li,et al.  Optimization of Molecules via Deep Reinforcement Learning , 2018, Scientific Reports.

[7]  Mohamed Ahmed,et al.  Exploring Deep Recurrent Models with Reinforcement Learning for Molecule Design , 2018, ICLR.

[8]  Xiaowo Wang,et al.  Synthetic Promoter Design in Escherichia coli based on Generative Adversarial Network , 2019 .

[9]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[10]  Frances H. Arnold,et al.  Design by Directed Evolution. , 1998 .

[11]  Alán Aspuru-Guzik,et al.  Inverse molecular design using machine learning: Generative models for matter engineering , 2018, Science.

[12]  Kam-Fai Wong,et al.  Integrating planning for task-completion dialogue policy learning , 2018, ACL.

[13]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[16]  Ziheng Wang,et al.  Antibody complementarity determining region design using high-capacity machine learning , 2019, bioRxiv.

[17]  Dick de Ridder,et al.  Designing Eukaryotic Gene Expression Regulation Using Machine Learning. , 2020, Trends in biotechnology.

[18]  Jacob Witten,et al.  Deep learning regression model for antimicrobial peptide design , 2019, bioRxiv.

[19]  E. Shakhnovich,et al.  A new approach to the design of stable proteins. , 1993, Protein engineering.

[20]  Jennifer Listgarten,et al.  Design by adaptive sampling , 2018, ArXiv.

[21]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[22]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[23]  Joelle Pineau,et al.  An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.

[24]  Frank Hutter,et al.  Learning to Design RNA , 2018, ICLR.

[25]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[26]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[27]  Yoshua Bengio,et al.  Machine Learning for Combinatorial Optimization: a Methodological Tour d'Horizon , 2018, Eur. J. Oper. Res..

[28]  James Zou,et al.  Feedback GAN (FBGAN) for DNA: a Novel Feedback-Loop Architecture for Optimizing Protein Functions , 2018, ArXiv.

[29]  R. Jernigan,et al.  Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. , 1996, Journal of molecular biology.

[30]  Le Song,et al.  2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[31]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[32]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[33]  Dale Schuurmans,et al.  Reward Augmented Maximum Likelihood for Neural Structured Prediction , 2016, NIPS.

[34]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[35]  Pieter Abbeel,et al.  Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.

[36]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[37]  Kevin Murphy,et al.  A view of estimation of distribution algorithms through the lens of expectation-maximization , 2019, GECCO Companion.

[38]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[39]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[40]  Regina Barzilay,et al.  Generative Models for Graph-Based Protein Design , 2019, DGS@ICLR.

[41]  Sari Sabban,et al.  RamaNet: Computational de novo helical protein backbone design using a long short-term memory generative neural network , 2019, bioRxiv.

[42]  Arpit Joshi,et al.  Generating protein sequences from antibiotic resistance genes data using Generative Adversarial Networks , 2019, ArXiv.

[43]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[44]  John C. Duchi,et al.  Derivative Free Optimization Via Repeated Classification , 2018, AISTATS.

[45]  David Barber,et al.  Optimization by Variational Bounding , 2013, ESANN.

[46]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[47]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[48]  Max Welling,et al.  Attention, Learn to Solve Routing Problems! , 2018, ICLR.

[49]  Sergey Levine,et al.  Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[50]  Sergey Levine,et al.  When to Trust Your Model: Model-Based Policy Optimization , 2019, NeurIPS.

[51]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[52]  Jos'e Miguel Hern'andez-Lobato,et al.  Constrained Bayesian Optimization for Automatic Chemical Design , 2017 .

[53]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[54]  Jennifer Listgarten,et al.  Conditioning by adaptive sampling for robust design , 2019, ICML.

[55]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[56]  Kirthevasan Kandasamy,et al.  ChemBO: Bayesian Optimization of Small Organic Molecules with Synthesizable Recommendations , 2019, AISTATS.

[57]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[58]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.