A machine learning Automated Recommendation Tool for synthetic biology

Synthetic biology allows us to bioengineer cells to synthesize novel valuable molecules such as renewable biofuels or anticancer drugs. However, traditional synthetic biology approaches involve ad-hoc engineering practices, which lead to long development times. Here, we present the Automated Recommendation Tool (ART), a tool that leverages machine learning and probabilistic modeling techniques to guide synthetic biology in a systematic fashion, without the need for a full mechanistic understanding of the biological system. Using sampling-based optimization, ART provides a set of recommended strains to be built in the next engineering cycle, alongside probabilistic predictions of their production levels. We demonstrate the capabilities of ART on simulated data sets, as well as experimental data from real metabolic engineering projects producing renewable biofuels, hoppy flavored beer without hops, fatty acids, and tryptophan. Finally, we discuss the limitations of this approach, and the practical consequences of the underlying assumptions failing.

[1]  Alán Aspuru-Guzik,et al.  Next-Generation Experimentation with Self-Driving Laboratories , 2019, Trends in Chemistry.

[2]  J. Keasling,et al.  A targeted proteomics toolkit for high-throughput absolute quantification of Escherichia coli proteins. , 2014, Metabolic engineering.

[3]  J. Keasling Manufacturing Molecules Through Metabolic Engineering , 2010, Science.

[4]  M. Jewett,et al.  Cell-free synthetic biology: thinking outside the cell. , 2012, Metabolic engineering.

[5]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[6]  Stephen J. Van Dien,et al.  From the first drop to the first truckload: commercialization of microbial processes for renewable chemicals. , 2013 .

[7]  Tony R. Martinez,et al.  Turning Bayesian model averaging into Bayesian model combination , 2011, The 2011 International Joint Conference on Neural Networks.

[8]  J. Keasling,et al.  Engineering Cellular Metabolism , 2016, Cell.

[9]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[10]  J. Doudna,et al.  The new frontier of genome engineering with CRISPR-Cas9 , 2014, Science.

[11]  Justin Schwartz Engineering , 1929, Nature.

[12]  Timothy S. Ham,et al.  Design, implementation and practice of JBEI-ICE: an open source biological part registry platform and tools , 2012, Nucleic acids research.

[13]  F. Prinz,et al.  Believe it or not: how much can we rely on published data on potential drug targets? , 2011, Nature Reviews Drug Discovery.

[14]  P. K. Ajikumar,et al.  The future of metabolic engineering and synthetic biology: towards a systematic practice. , 2012, Metabolic engineering.

[15]  J. Keasling,et al.  High-level semi-synthetic production of the potent antimalarial artemisinin , 2013, Nature.

[16]  J. Keasling,et al.  Synthetic and systems biology for microbial production of commodity chemicals , 2016, npj Systems Biology and Applications.

[17]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[18]  Christopher A. Voigt,et al.  Automated design of synthetic ribosome binding sites to control protein expression , 2016 .

[19]  Peter Willett,et al.  What is a tutorial , 2013 .

[20]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[21]  A. Burt,et al.  A CRISPR–Cas9 gene drive targeting doublesex causes complete population suppression in caged Anopheles gambiae mosquitoes , 2018, Nature Biotechnology.

[22]  A. Brix Bayesian Data Analysis, 2nd edn , 2005 .

[23]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[24]  François Laviolette,et al.  Agnostic Bayesian Learning of Ensembles , 2014, ICML.

[25]  Roberto Aldave Systematic ensemble learning and extensions for regression , 2015 .

[26]  Jonas Mockus,et al.  Global Optimization and the Bayesian Approach , 1989 .

[27]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[28]  J. Keasling,et al.  Principal component analysis of proteomics (PCAP) as a tool to direct metabolic engineering. , 2015, Metabolic engineering.

[29]  Mark J van der Laan,et al.  Super Learning: An Application to the Prediction of HIV-1 Drug Resistance , 2007, Statistical applications in genetics and molecular biology.

[30]  Pablo Carbonell,et al.  Opportunities at the Intersection of Synthetic Biology, Machine Learning, and Automation. , 2019, ACS synthetic biology.

[31]  Adrian E. Raftery,et al.  Bayesian Model Averaging: A Tutorial , 2016 .

[32]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[33]  P. Adams,et al.  Analytics for Metabolic Engineering , 2015, Front. Bioeng. Biotechnol..

[34]  Peter Jackson,et al.  Rewriting yeast central carbon metabolism for industrial isoprenoid production , 2016, Nature.

[35]  Jason H. Yang,et al.  A White-Box Machine Learning Approach for Revealing Antibiotic Mechanisms of Action , 2019, Cell.

[36]  Karen L. Wooley,et al.  Absorbable hemostatic hydrogels comprising composites of sacrificial templates and honeycomb-like nanofibrous mats of chitosan , 2019, Nature Communications.

[37]  Markus J. Herrgård,et al.  Predictable tuning of protein expression in bacteria , 2016, Nature Methods.

[38]  M. Baker 1,500 scientists lift the lid on reproducibility , 2016, Nature.

[39]  G. Stephanopoulos Metabolic fluxes and metabolic engineering. , 1999, Metabolic engineering.

[40]  Minsoo Kim,et al.  A Unified Framework for Tumor Proliferation Score Prediction in Breast Histopathology , 2016, DLMIA/ML-CDS@MICCAI.

[41]  R. Sharan,et al.  Metabolic Network Prediction of Drug Side Effects. , 2016, Cell systems.

[42]  Steve C. C. Shih,et al.  On-chip integration of droplet microfluidics and nanostructure-initiator mass spectrometry for enzyme screening. , 2017, Lab on a chip.

[43]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[44]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[45]  C. Begley,et al.  Drug development: Raise standards for preclinical cancer research , 2012, Nature.

[46]  Jay D Keasling,et al.  Metabolic engineering of Escherichia coli for limonene and perillyl alcohol production. , 2013, Metabolic engineering.

[47]  Richard J. Beckman,et al.  A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code , 2000, Technometrics.

[48]  M. Schatz,et al.  Big Data: Astronomical or Genomical? , 2015, PLoS biology.

[49]  Leroy Cronin,et al.  Controlling an organic synthesis robot with machine learning to search for new reactivity , 2018, Nature.

[50]  S. Van Dien,et al.  From the first drop to the first truckload : commercialization of microbial processes for renewable chemicals , 2013 .

[51]  Ruipeng Li,et al.  A Kriging-Based Approach to Autonomous Experimentation with Applications to X-Ray Scattering , 2019, Scientific Reports.

[52]  B. Witholt,et al.  Biotransformation of limonene by bacteria, fungi, yeasts, and plants , 2003, Applied Microbiology and Biotechnology.

[53]  Keith E. J. Tyo,et al.  Isoprenoid Pathway Optimization for Taxol Precursor Overproduction in Escherichia coli , 2010, Science.

[54]  J. Keasling,et al.  An automated 'cells-to-peptides' sample preparation workflow for high-throughput, quantitative proteomic assays of microbes. , 2019, Journal of proteome research.

[55]  Jay D Keasling,et al.  Industrial brewing yeast engineered for the production of primary flavor determinants in hopped beer , 2018, Nature Communications.

[56]  Zak Costello,et al.  A machine learning approach to predict metabolic pathway dynamics from time-series multiomics data , 2018, npj Systems Biology and Applications.

[57]  Edward I. George,et al.  Bayesian Ensemble Learning , 2006, NIPS.

[58]  Erin LeDell,et al.  Scalable Ensemble Learning and Computationally Efficient Variance Estimation , 2015 .

[59]  M. J. van der Laan,et al.  Statistical Applications in Genetics and Molecular Biology Super Learner , 2010 .

[60]  Paul H Opgenorth,et al.  Lessons from Two Design-Build-Test-Learn Cycles of Dodecanol Production in Escherichia coli Aided by Machine Learning. , 2019, ACS synthetic biology.

[61]  J. Mockus Bayesian Approach to Global Optimization: Theory and Applications , 1989 .

[62]  Nathan J Hillson,et al.  The Experiment Data Depot: A Web-Based Software Tool for Biological Experimental Data Storage, Sharing, and Visualization. , 2017, ACS synthetic biology.

[63]  Erika Check Hayden,et al.  The automated lab , 2014, Nature.

[64]  Nicholas C Tang,et al.  DNA synthesis, assembly and applications in synthetic biology. , 2012, Current opinion in chemical biology.

[65]  Tanmoy Bhattacharya,et al.  The need for uncertainty quantification in machine-assisted medical decision making , 2019, Nat. Mach. Intell..

[66]  Saurabh Sinha,et al.  Towards a fully automated algorithm driven platform for biosystems design , 2019, Nature Communications.

[67]  Jaume Bacardit Applications of evolutionary computation: 19th European conference, Evoapplications 2016 Porto, Portugal, March 30 – April 1, 2016 proceedings, part II , 2016 .

[68]  Yu Chen,et al.  Predictive engineering and optimization of tryptophan metabolism in yeast through a combination of mechanistic and machine learning models , 2019, bioRxiv.

[69]  Nathan J Hillson,et al.  A Droplet Microfluidic Platform for Automating Genetic Engineering. , 2016, ACS synthetic biology.

[70]  Jay D Keasling,et al.  Combining mechanistic and machine learning models for predictive engineering and optimization of tryptophan metabolism , 2020, Nature Communications.

[71]  Josef Kallo,et al.  Improving the environmental impact of civil aircraft by fuel cell technology: concepts and technological progress , 2010 .

[72]  Nicola Zamboni,et al.  High-throughput discovery metabolomics. , 2015, Current opinion in biotechnology.

[73]  D. Henning Metabolism , 1972, Introduction to a Phenomenology of Life.

[74]  Michael W Deem,et al.  Parallel tempering: theory, applications, and new perspectives. , 2005, Physical chemistry chemical physics : PCCP.

[75]  Paul D. Adams,et al.  Automated flow-based/digital microfluidic platform integrated with onsite electroporation process for multiplex genetic engineering applications , 2018, 2018 IEEE Micro Electro Mechanical Systems (MEMS).

[76]  Neil Swainston,et al.  Machine Learning of Designed Translational Control Allows Predictive Pathway Optimization in Escherichia coli. , 2019, ACS synthetic biology.

[77]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[78]  H. Salis,et al.  Translation rate is controlled by coupled trade-offs between site accessibility, selective RNA unfolding and sliding at upstream standby sites , 2013, Nucleic acids research.

[79]  Daniel W. Crunkleton,et al.  Hydrogenated monoterpenes as diesel fuel additives , 2009 .

[80]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[81]  Trent Munro,et al.  A novel mammalian cell line development platform utilizing nanofluidics and optoelectro positioning technology , 2018, Biotechnology progress.

[82]  Corie Lok,et al.  Thinking outside the cell , 2006, Nature Biotechnology.

[83]  A. Kiureghian,et al.  Aleatory or epistemic? Does it matter? , 2009 .

[84]  J. Collins,et al.  A brief history of synthetic biology , 2014, Nature Reviews Microbiology.

[85]  K. Narva,et al.  Specific binding of Bacillus thuringiensis Cry1Ea toxin, and Cry1Ac and Cry1Fa competition analyses in Anticarsia gemmatalis and Chrysodeixis includens , 2019, Scientific Reports.

[86]  Wolfgang Wiechert,et al.  Bioprocess automation on a Mini Pilot Plant enables fast quantitative microbial phenotyping , 2015, Microbial Cell Factories.

[87]  C. Rock,et al.  Regulation of fatty acid biosynthesis in Escherichia coli. , 1993, Microbiological reviews.

[88]  Christopher. Simons,et al.  Machine learning with Python , 2017 .

[89]  Douglas C. Friedman Industrialization of Biology. A Roadmap to Accelerate the Advanced Manufacturing of Chemicals , 2015 .

[90]  Peter Grünwald,et al.  Using Stacking to Average Bayesian Predictive Distributions (with Discussion) , 2018 .

[91]  T. Gardner Synthetic biology: from hype to impact. , 2013, Trends in biotechnology.

[92]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[93]  Gustavo Carneiro,et al.  Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support , 2017, Lecture Notes in Computer Science.

[94]  Joseph V. Kurian,et al.  A New Polymer Platform for the Future — Sorona® from Corn Derived 1,3-Propanediol , 2005 .

[95]  Sebastian Thrun,et al.  Toward robotic cars , 2010, CACM.

[96]  Randal S. Olson,et al.  Automating Biomedical Data Science Through Tree-Based Pipeline Optimization , 2016, EvoApplications.

[97]  T. Lee,et al.  Natural products as biofuels and bio-based chemicals: fatty acids and isoprenoids. , 2015, Natural product reports.

[98]  C. Glenn Begley,et al.  Raise standards for preclinical cancer research , 2012 .