Unifying Likelihood-free Inference with Black-box Sequence Design and Beyond

Black-box optimization formulations for biological sequence design have drawn recent attention due to their promising potential impact on the pharmaceutical industry. In this work, we propose to unify two seemingly distinct worlds: likelihood-free inference and black-box sequence design, under one probabilistic framework. In tandem, we provide a recipe for constructing various sequence design methods based on this framework. We show how previous drug discovery approaches can be “reinvented” in our framework, and further propose new probabilistic sequence design algorithms. Extensive experiments illustrate the benefits of the proposed methodology.

[1]  Dirk P. Kroese,et al.  Cross‐Entropy Method , 2011 .

[2]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[3]  Iain Murray,et al.  Fast $\epsilon$-free Inference of Simulation Models with Bayesian Conditional Density Estimation , 2016, 1605.06376.

[4]  David J. Nott,et al.  Variational Bayes With Intractable Likelihood , 2015, 1503.08621.

[5]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[6]  Li Li,et al.  Optimization of Molecules via Deep Reinforcement Learning , 2018, Scientific Reports.

[7]  Iain Murray,et al.  Sequential Neural Likelihood: Fast Likelihood-free Inference with Autoregressive Flows , 2018, AISTATS.

[8]  Jakob H. Macke,et al.  Likelihood-free inference with emulator networks , 2018, AABI.

[9]  Alán Aspuru-Guzik,et al.  Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models , 2017, ArXiv.

[10]  Jakob H. Macke,et al.  Flexible statistical inference for mechanistic models of neural dynamics , 2017, NIPS.

[11]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[12]  Alán Aspuru-Guzik,et al.  Augmenting Genetic Algorithms with Deep Neural Networks for Exploring the Chemical Space , 2020, ICLR.

[13]  James Zou,et al.  Feedback GAN for DNA optimizes protein functions , 2019, Nature Machine Intelligence.

[14]  Jinwoo Shin,et al.  Guiding Deep Molecular Optimization with Genetic Exploration , 2020, NeurIPS.

[15]  Scott A. Sisson,et al.  Extending approximate Bayesian computation methods to high dimensions via a Gaussian copula model , 2015, 1504.04093.

[16]  Jennifer Listgarten,et al.  Conditioning by adaptive sampling for robust design , 2019, ICML.

[17]  Jean-Michel Marin,et al.  Approximate Bayesian computational methods , 2011, Statistics and Computing.

[18]  Yun S. Song,et al.  A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks , 2018, bioRxiv.

[19]  Rafael Izbicki,et al.  High-Dimensional Density Ratio Estimation with Extensions to Approximate Likelihood Computation , 2014, AISTATS.

[20]  Dmitry Chudakov,et al.  Local fitness landscape of the green fluorescent protein , 2016, Nature.

[21]  D. Balding,et al.  Approximate Bayesian computation in population genetics. , 2002, Genetics.

[22]  Gilles Louppe,et al.  Mining gold from implicit models to improve likelihood-free inference , 2018, Proceedings of the National Academy of Sciences.

[23]  Jacob Witten,et al.  Deep learning regression model for antimicrobial peptide design , 2019, bioRxiv.

[24]  David Dohan,et al.  Model-based reinforcement learning for biological sequence design , 2020, ICLR.

[25]  Kerrie Mengersen,et al.  Approximating the likelihood in approximate Bayesian computation , 2018, 1803.06645.

[26]  Anne Brindle,et al.  Genetic algorithms for function optimization , 1980 .

[27]  S. Rees,et al.  Principles of early drug discovery , 2011, British journal of pharmacology.

[28]  Mohamed Ahmed,et al.  Exploring Deep Recurrent Models with Reinforcement Learning for Molecule Design , 2018, ICLR.

[29]  G. Seelig,et al.  Human 5′ UTR design and variant effect prediction from a massively parallel translation assay , 2018, bioRxiv.

[30]  B. Rost,et al.  ProtTrans: Towards Cracking the Language of Lifes Code Through Self-Supervised Deep Learning and High Performance Computing. , 2021, IEEE transactions on pattern analysis and machine intelligence.

[31]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[32]  Frances H. Arnold,et al.  Enzyme Engineering for Nonaqueous Solvents: Random Mutagenesis to Enhance Activity of Subtilisin E in Polar Organic Media , 1991, Bio/Technology.

[33]  Kevin K. Yang,et al.  Machine-learning-guided directed evolution for protein engineering , 2018, Nature Methods.

[34]  David S. Greenberg,et al.  Automatic Posterior Transformation for Likelihood-Free Inference , 2019, ICML.

[35]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[36]  M. Gutmann,et al.  Fundamentals and Recent Developments in Approximate Bayesian Computation , 2016, Systematic biology.

[37]  Iain Murray,et al.  Masked Autoregressive Flow for Density Estimation , 2017, NIPS.

[38]  Jennifer Listgarten,et al.  Design by adaptive sampling , 2018, ArXiv.

[39]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[40]  Zachary Wu,et al.  Machine learning-assisted directed protein evolution with combinatorial libraries , 2019, Proceedings of the National Academy of Sciences.

[41]  Jukka Corander,et al.  Likelihood-Free Inference by Ratio Estimation , 2016, Bayesian Analysis.

[42]  S. Wood Statistical inference for noisy nonlinear ecological dynamic systems , 2010, Nature.

[43]  John C. Duchi,et al.  Derivative Free Optimization Via Repeated Classification , 2018, AISTATS.

[44]  Gilles Louppe,et al.  Likelihood-free MCMC with Amortized Approximate Likelihood Ratios , 2019 .

[45]  Koji Tsuda,et al.  Population-based de novo molecule generation, using grammatical evolution , 2018, 1804.02134.

[46]  Jaie C. Woodard,et al.  Survey of variation in human transcription factors reveals prevalent DNA binding changes , 2016, Science.

[47]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[48]  Paul Fearnhead,et al.  Constructing summary statistics for approximate Bayesian computation: semi‐automatic approximate Bayesian computation , 2012 .

[49]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[50]  Jan H. Jensen,et al.  A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space , 2018, Chemical science.

[51]  Ritabrata Dutta,et al.  Likelihood-free inference via classification , 2014, Stat. Comput..

[52]  Zhanxing Zhu,et al.  Neural Approximate Sufficient Statistics for Implicit Models , 2021, ICLR.

[53]  Yoshua Bengio,et al.  Learning To Navigate The Synthetically Accessible Chemical Space Using Reinforcement Learning , 2020, ICML.

[54]  Christian P. Robert,et al.  Approximate Bayesian computation via empirical likelihood , 2012 .

[55]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[56]  David Dohan,et al.  Population-Based Black-Box Optimization for Biological Sequence Design , 2020, ICML.

[57]  Ziheng Wang,et al.  Antibody complementarity determining region design using high-capacity machine learning , 2019, bioRxiv.

[58]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[59]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[60]  M. Blum Approximate Bayesian Computation: A Nonparametric Perspective , 2009, 0904.0635.

[61]  Weinan Zhang,et al.  GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation , 2020, ICLR.

[62]  John Canny,et al.  Evaluating Protein Transfer Learning with TAPE , 2019, bioRxiv.