Bayesian Optimization for Synthetic Gene Design

We address the problem of synthetic gene design using Bayesian optimization. The main issue when designing a gene is that the design space is defined in terms of long strings of characters of different lengths, which renders the optimization intractable. We propose a three-step approach to deal with this issue. First, we use a Gaussian process model to emulate the behavior of the cell. As inputs of the model, we use a set of biologically meaningful gene features, which allows us to define optimal gene designs rules. Based on the model outputs we define a multi-task acquisition function to optimize simultaneously severals aspects of interest. Finally, we define an evaluation function, which allow us to rank sets of candidate gene sequences that are coherent with the optimal design strategy. We illustrate the performance of this approach in a real gene design experiment with mammalian cells.

[1]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[2]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[3]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[4]  Yee Whye Teh,et al.  Semiparametric latent factor models , 2005, AISTATS.

[5]  Andrew Phillips,et al.  Towards programming languages for genetic engineering of living cells , 2009, Journal of The Royal Society Interface.

[6]  Michael A. Osborne Bayesian Gaussian processes for sequential prediction, optimisation and quadrature , 2010 .

[7]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[8]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[9]  M. Selbach,et al.  Global quantification of mammalian gene expression control , 2011, Nature.

[10]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[11]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[12]  Philipp Hennig,et al.  Entropy Search for Information-Efficient Global Optimization , 2011, J. Mach. Learn. Res..

[13]  Neil D. Lawrence,et al.  Kernels for Vector-Valued Functions: a Review , 2011, Found. Trends Mach. Learn..

[14]  Richard I. Kitney,et al.  Synthetic Biology — A Primer , 2012 .

[15]  Nando de Freitas,et al.  Bayesian Optimization in High Dimensions via Random Embeddings , 2013, IJCAI.

[16]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[17]  Roman Garnett,et al.  Active Learning of Linear Embeddings for Gaussian Processes , 2013, UAI.