Active and machine learning-based approaches to rapidly enhance microbial chemical production

In order to make renewable fuels and chemicals from microbes, new methods are required to engineer microbes more intelligently. Computational approaches, to engineer strains for enhanced chemical production typically rely on detailed mechanistic models (e.g., kinetic/stoichiometric models of metabolism) — requiring many experimental datasets for their parameterization—while experimental methods may require screening large mutant libraries to explore the design space for the few mutants with desired behaviors. To address these limitations, we developed an active and machine learning approach (ActiveOpt) to intelligently guide experiments to arrive at an optimal phenotype with minimal measured datasets. ActiveOpt was applied to two separate case studies to evaluate its potential to increase valine yields and neurosporene productivity in Escherichia coli. In both the cases, ActiveOpt identified the best performing strain in fewer experiments than the case studies used. This work demonstrates that machine and active learning approaches have the potential to greatly facilitate metabolic engineering efforts to rapidly achieve its objectives.

[1]  J. Reed,et al.  Large-Scale Bi-Level Strain Design Approaches and Mixed-Integer Programming Solution Techniques , 2011, PloS one.

[2]  S. Sandmeyer,et al.  Functional genomics for the oleaginous yeast Yarrowia lipolytica. , 2018, Metabolic engineering.

[3]  C. Maranas,et al.  A genome-scale Escherichia coli kinetic metabolic model k-ecoli457 satisfying flux data for multiple mutant strains , 2016, Nature Communications.

[4]  Christopher P. Long,et al.  Quantifying biomass composition by gas chromatography/mass spectrometry. , 2014, Analytical chemistry.

[5]  Gregory Stephanopoulos,et al.  Accurate assessment of amino acid mass isotopomer distributions for metabolic flux analysis. , 2007, Analytical chemistry.

[6]  Nicholas Roehner,et al.  Double Dutch: A Tool for Designing Combinatorial Libraries of Biological Systems. , 2016, ACS synthetic biology.

[7]  Markus J. Herrgård,et al.  Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae. , 2006, Genome research.

[8]  E. Amann,et al.  Tightly regulated tac promoter vectors useful for the expression of unfused and fused proteins in Escherichia coli. , 1988, Gene.

[9]  Brian F Pfleger,et al.  Production of medium chain length fatty alcohols from glucose in Escherichia coli. , 2013, Metabolic engineering.

[10]  Frances H Arnold,et al.  Engineered ketol-acid reductoisomerase and alcohol dehydrogenase enable anaerobic 2-methylpropan-1-ol production at theoretical yield in Escherichia coli. , 2011, Metabolic engineering.

[11]  Jennifer L. Reed,et al.  OptORF: Optimal metabolic and regulatory perturbations for metabolic engineering of microbial strains , 2010, BMC Systems Biology.

[12]  Maria-Florina Balcan,et al.  The Power of Localization for Efficiently Learning Linear Separators with Noise , 2013, J. ACM.

[13]  Tarek Abdelzaher,et al.  Proceedings of the 12th international conference on Information processing in sensor networks , 2013, IPSN 2013.

[14]  H. Salis,et al.  Translation rate is controlled by coupled trade-offs between site accessibility, selective RNA unfolding and sliding at upstream standby sites , 2013, Nucleic acids research.

[15]  Matthias Reuss,et al.  Optimal re-design of primary metabolism in Escherichia coli using linlog kinetics. , 2004, Metabolic engineering.

[16]  Bernhard O. Palsson,et al.  Identification of Genome-Scale Metabolic Network Models Using Experimentally Measured Flux Profiles , 2006, PLoS Comput. Biol..

[17]  Ryan T Gill,et al.  Strategy for directing combinatorial genome engineering in Escherichia coli , 2012, Proceedings of the National Academy of Sciences.

[18]  J. Liao,et al.  Ensemble Modeling for Aromatic Production in Escherichia coli , 2009, PloS one.

[19]  Ljubisa Miskovic,et al.  iSCHRUNK--In Silico Approach to Characterization and Reduction of Uncertainty in the Kinetic Models of Genome-scale Metabolic Networks. , 2016, Metabolic engineering.

[20]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[21]  C. Myers,et al.  Chemical genomic guided engineering of gamma-valerolactone tolerant yeast , 2017, bioRxiv.

[22]  Keith E. J. Tyo,et al.  Isoprenoid Pathway Optimization for Taxol Precursor Overproduction in Escherichia coli , 2010, Science.

[23]  F. Neidhardt,et al.  Culture Medium for Enterobacteria , 1974, Journal of bacteriology.

[24]  Jason Weston,et al.  A user's guide to support vector machines. , 2010, Methods in molecular biology.

[25]  Terry Hazen,et al.  Molecular Systems Biology 9; Article number 674; doi:10.1038/msb.2013.30 Citation: Molecular Systems Biology 9:674 , 2022 .

[26]  S. Lee,et al.  Fed‐batch culture of Escherichia coli for L‐valine production based on in silico flux response analysis , 2011, Biotechnology and bioengineering.

[27]  H. Salis,et al.  Efficient search, mapping, and optimization of multi‐protein genetic systems in diverse bacteria , 2014 .

[28]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[29]  Vivek K. Mutalik,et al.  Composability of regulatory sequences controlling transcription and translation in Escherichia coli , 2013, Proceedings of the National Academy of Sciences.

[30]  Partha Niyogi,et al.  A Formulation for Active Learning with Applications to Object Detection , 1995 .

[31]  Jeffrey D Orth,et al.  What is flux balance analysis? , 2010, Nature Biotechnology.

[32]  A. Burgard,et al.  Optknock: A bilevel programming framework for identifying gene knockout strategies for microbial strain optimization , 2003, Biotechnology and bioengineering.

[33]  C. Tomlin,et al.  Expression-level optimization of a multi-enzyme pathway in the absence of a high-throughput assay , 2013, Nucleic acids research.

[34]  Marjan De Mey,et al.  Multivariate modular metabolic engineering for pathway and strain optimization. , 2014, Current opinion in biotechnology.

[35]  Jean-Charles Portais,et al.  IsoCor: correcting MS data in isotope labeling experiments , 2012, Bioinform..

[36]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[37]  Christopher A. Voigt,et al.  Automated Design of Synthetic Ribosome Binding Sites to Precisely Control Protein Expression , 2009, Nature Biotechnology.

[38]  H. Sambrook Molecular cloning : a laboratory manual. Cold Spring Harbor, NY , 1989 .

[39]  M. V. Burnašev SEQUENTIAL DISCRIMINATION OF HYPOTHESES WITH CONTROL OF OBSERVATIONS , 1980 .

[40]  N. Price,et al.  Probabilistic integrative modeling of genome-scale metabolic and regulatory networks in Escherichia coli and Mycobacterium tuberculosis , 2010, Proceedings of the National Academy of Sciences.

[41]  Farren J. Isaacs,et al.  Programming cells by multiplex genome engineering and accelerated evolution , 2009, Nature.

[42]  B. Palsson,et al.  An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR) , 2003, Genome Biology.

[43]  Christopher A. Voigt,et al.  Automated design of synthetic ribosome binding sites to control protein expression , 2016 .

[44]  J. Sambrook,et al.  Molecular Cloning: A Laboratory Manual , 2001 .

[45]  E. V. Nikolaev,et al.  The elucidation of metabolic pathways and their improvements using stable optimization of large-scale kinetic models of cellular systems. , 2010, Metabolic engineering.

[46]  Larry A. Wasserman,et al.  Active Learning For Identifying Function Threshold Boundaries , 2005, NIPS.

[47]  Ljubisa Miskovic,et al.  Production of biofuels and biochemicals: in need of an ORACLE. , 2010, Trends in biotechnology.

[48]  S. Lee,et al.  Metabolic engineering of Escherichia coli for the production of l-valine based on transcriptome analysis and in silico gene knockout simulation , 2007, Proceedings of the National Academy of Sciences.