Data-driven rational biosynthesis design: from molecules to cell factories

A proliferation of chemical, reaction and enzyme databases, new computational methods and software tools for data-driven rational biosynthesis design have emerged in recent years. With the coming of the era of big data, particularly in the bio-medical field, data-driven rational biosynthesis design could potentially be useful to construct target-oriented chassis organisms. Engineering the complicated metabolic systems of chassis organisms to biosynthesize target molecules from inexpensive biomass is the main goal of cell factory design. The process of data-driven cell factory design could be divided into several parts: (1) target molecule selection; (2) metabolic reaction and pathway design; (3) prediction of novel enzymes based on protein domain and structure transformation of biosynthetic reactions; (4) construction of large-scale DNA for metabolic pathways; and (5) DNA assembly methods and visualization tools. The construction of a one-stop cell factory system could achieve automated design from the molecule level to the chassis level. In this article, we outline data-driven rational biosynthesis design steps and provide an overview of related tools in individual steps.

[1]  Gemma L. Holliday,et al.  EC-BLAST: A Tool to Automatically Search and Compare Enzyme Reactions , 2014, Nature Methods.

[2]  Susumu Goto,et al.  PathPred: an enzyme-catalyzed metabolic pathway prediction server , 2010, Nucleic Acids Res..

[3]  Shaozhen Ding,et al.  EcoSynther: A Customized Platform To Explore the Biosynthetic Potential in E. coli. , 2017, ACS chemical biology.

[4]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[5]  Minoru Kanehisa,et al.  KEGG: new perspectives on genomes, pathways, diseases and drugs , 2016, Nucleic Acids Res..

[6]  James Alastair McLaughlin,et al.  SynBioHub: A Standards-Enabled Design Repository for Synthetic Biology. , 2018, ACS synthetic biology.

[7]  J. Keasling,et al.  Synthetic and systems biology for microbial production of commodity chemicals , 2016, npj Systems Biology and Applications.

[8]  Peter D. Karp,et al.  MetaCyc: a multiorganism database of metabolic pathways and enzymes. , 2004, Nucleic acids research.

[9]  Dong-Sheng Cao,et al.  RxnFinder: biochemical reaction search engines using molecular structures, molecular fragments and reaction similarity , 2011, Bioinform..

[10]  Ping Zheng,et al.  ReacKnock: Identifying Reaction Deletion Strategies for Microbial Strain Optimization Based on Genome-Scale Metabolic Network , 2013, PloS one.

[11]  Chunhui Li,et al.  Exploring the diversity of complex metabolic networks , 2005, Bioinform..

[12]  F. Barona-Gómez,et al.  Phylogenomic Analysis of Natural Products Biosynthetic Gene Clusters Allows Discovery of Arseno-Organic Metabolites in Model Streptomycetes , 2016, bioRxiv.

[13]  Zixin Deng,et al.  PhID: An Open-Access Integrated Pharmacology Interactions Database for Drugs, Targets, Diseases, Genes, Side-Effects, and Pathways , 2017, J. Chem. Inf. Model..

[14]  Qian-Nan Hu,et al.  Assignment of EC Numbers to Enzymatic Reactions with Reaction Difference Fingerprints , 2012, PloS one.

[15]  Hsien-Da Huang,et al.  FMM: a web server for metabolic pathway reconstruction and comparative analysis , 2009, Nucleic Acids Res..

[16]  Kai Blin,et al.  antiSMASH 4.0—improvements in chemistry prediction and gene cluster boundary identification , 2017, Nucleic Acids Res..

[17]  G. Church,et al.  Analysis of optimality in natural and perturbed metabolic networks , 2002 .

[18]  Antje Chang,et al.  BRENDA in 2019: a European ELIXIR core data resource , 2018, Nucleic Acids Res..

[19]  Oliver Fiehn,et al.  MINEs: open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics , 2015, Journal of Cheminformatics.

[20]  Anne Morgat,et al.  Updates in Rhea – an expert curated resource of biochemical reactions , 2016, Nucleic Acids Res..

[21]  Allan Kuchinsky,et al.  The Synthetic Biology Open Language (SBOL) provides a community standard for communicating designs in synthetic biology , 2014, Nature Biotechnology.

[22]  Lei Xie,et al.  Harnessing Big Data for Systems Pharmacology , 2016, bioRxiv.

[23]  Carola Engler,et al.  A One Pot, One Step, Precision Cloning Method with High Throughput Capability , 2008, PloS one.

[24]  Rainer Breitling,et al.  Computational tools for the synthetic design of biochemical pathways , 2012, Nature Reviews Microbiology.

[25]  Adam M. Feist,et al.  Generation of an atlas for commodity chemical production in Escherichia coli and a novel pathway prediction algorithm, GEM-Path. , 2014, Metabolic engineering.

[26]  Roger G. Linington,et al.  Insights into Secondary Metabolism from a Global Analysis of Prokaryotic Biosynthetic Gene Clusters , 2014, Cell.

[27]  Jingdong Tian,et al.  Circular polymerase extension cloning for high-throughput cloning of complex and combinatorial DNA libraries , 2011, Nature Protocols.

[28]  N. Shapiro,et al.  Direct pathway cloning of the sodorifen biosynthetic gene cluster and recombinant generation of its product in E. coli , 2019, Microbial Cell Factories.

[29]  Jingdong Tian,et al.  Circular Polymerase Extension Cloning of Complex Gene Libraries and Pathways , 2009, PloS one.

[30]  T. Ashburn,et al.  Drug repositioning: identifying and developing new uses for existing drugs , 2004, Nature Reviews Drug Discovery.

[31]  Michael A. Skinnider,et al.  Genomes to natural products PRediction Informatics for Secondary Metabolomes (PRISM) , 2015, Nucleic acids research.

[32]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[33]  J. Keasling,et al.  Semi-synthetic artemisinin: a model for the use of synthetic biology in pharmaceutical development , 2014, Nature Reviews Microbiology.

[34]  Adrian T. Grzybowski,et al.  Complete biosynthesis of cannabinoids and their unnatural analogues in yeast , 2019, Nature.

[35]  Yu Tian,et al.  PrecursorFinder: a customized biosynthetic precursor explorer , 2019, Bioinform..

[36]  Haoran Zhang,et al.  BioSynther: a customized biosynthetic potential explorer , 2016, Bioinform..

[37]  Kevin Clancy,et al.  DNA Assembly Tools and Strategies for the Generation of Plasmids. , 2014, Microbiology spectrum.

[38]  Pablo Carbonell,et al.  RetroPath2.0: A retrosynthesis workflow for metabolic engineers. , 2018, Metabolic engineering.

[39]  Markus Krummenacker,et al.  The MetaCyc database of metabolic pathways and enzymes , 2017, Nucleic acids research.

[40]  Neil Swainston,et al.  Selenzyme: enzyme selection tool for pathway design , 2017, bioRxiv.

[41]  James G. Jeffryes,et al.  A pathway for every product? Tools to discover and design plant metabolism. , 2018, Plant science : an international journal of experimental plant biology.

[42]  P. Sanseau,et al.  Drug repurposing: progress, challenges and recommendations , 2018, Nature Reviews Drug Discovery.

[43]  J. Keasling,et al.  High-level semi-synthetic production of the potent antimalarial artemisinin , 2013, Nature.

[44]  Rainer Breitling,et al.  Exploiting plug-and-play synthetic biology for drug discovery and production in microorganisms , 2011, Nature Reviews Microbiology.

[45]  Drew Endy,et al.  Engineering BioBrick vectors from BioBrick parts , 2008, Journal of Biological Engineering.

[46]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[47]  Dietmar Schomburg,et al.  BKM-react, an integrated biochemical reaction database , 2011, BMC Biochemistry.

[48]  J. Keasling,et al.  Engineering Cellular Metabolism , 2016, Cell.

[49]  C. Corre,et al.  Triggering the expression of a silent gene cluster from genetically intractable bacteria results in scleric acid discovery† †Electronic supplementary information (ESI) available: Supplementary methods and results; Tables S1–S6; Fig. S1–S22. See DOI: 10.1039/c8sc03814g , 2018, Chemical science.

[50]  James Alastair McLaughlin,et al.  SBOLDesigner 2: An Intuitive Tool for Structural Genetic Design. , 2017, ACS synthetic biology.

[51]  Jens Nielsen,et al.  Synergies between synthetic biology and metabolic engineering , 2011, Nature Biotechnology.

[52]  Yoshihiro Yamanishi,et al.  E-zyme: predicting potential EC numbers from the chemical transformation pattern of substrate-product pairs , 2009, Bioinform..

[53]  Anne Morgat,et al.  Updates in Rhea: SPARQLing biochemical reaction data , 2018, Nucleic Acids Res..

[54]  B. Palsson,et al.  A protocol for generating a high-quality genome-scale metabolic reconstruction , 2010 .

[55]  A. Burgard,et al.  Optknock: A bilevel programming framework for identifying gene knockout strategies for microbial strain optimization , 2003, Biotechnology and bioengineering.

[56]  Huimin Zhao,et al.  Pathway Design, Engineering, and Optimization. , 2016, Advances in biochemical engineering/biotechnology.

[57]  Jennifer L. Reed,et al.  OptORF: Optimal metabolic and regulatory perturbations for metabolic engineering of microbial strains , 2010, BMC Systems Biology.

[58]  C. Dobson Chemical space and biology , 2004, Nature.

[59]  C. Maranas,et al.  An optimization framework for identifying reaction activation/inhibition or elimination candidates for overproduction in microbial systems. , 2006, Metabolic engineering.

[60]  Costas D. Maranas,et al.  OptForce: An Optimization Procedure for Identifying All Genetic Manipulations Leading to Targeted Overproductions , 2010, PLoS Comput. Biol..

[61]  Jean-Charles Portais,et al.  FindPath: a Matlab solution for in silico design of synthetic metabolic pathways , 2014, Bioinform..

[62]  Dimitris Papamichail,et al.  Computational Tools and Algorithms for Designing Customized Synthetic Genes , 2014, Front. Bioeng. Biotechnol..

[63]  H. Kitano,et al.  Software for systems biology: from tools to integrated platforms , 2011, Nature Reviews Genetics.

[64]  V. Marx Biology: The big challenges of big data , 2013, Nature.

[65]  Anthony J. Hickey,et al.  Reproducibility, sharing and progress in nanomaterial databases. , 2017, Nature nanotechnology.

[66]  Hamilton O. Smith,et al.  Single-step linker-based combinatorial assembly of promoter and gene cassettes for pathway engineering , 2011, Biotechnology Letters.

[67]  D. G. Gibson,et al.  Enzymatic assembly of DNA molecules up to several hundred kilobases , 2009, Nature Methods.

[68]  Drew Endy,et al.  A survey of enabling technologies in synthetic biology , 2013, Journal of biological engineering.

[69]  Nathan J Hillson,et al.  j5 DNA assembly design automation software. , 2012, ACS synthetic biology.

[70]  Lei Shi,et al.  SABIO-RK—database for biochemical reaction kinetics , 2011, Nucleic Acids Res..

[71]  Haruki Nakamura,et al.  Protein Data Bank (PDB): The Single Global Macromolecular Structure Archive. , 2017, Methods in molecular biology.

[72]  Jeffrey D Orth,et al.  What is flux balance analysis? , 2010, Nature Biotechnology.

[73]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[74]  Cole H. Christie,et al.  Protein Data Bank: the single global archive for 3D macromolecular structure data , 2018, Nucleic acids research.

[75]  Huimin Zhao,et al.  Recent advances in DNA assembly technologies. , 2014, FEMS yeast research.

[76]  S. Elledge,et al.  Harnessing homologous recombination in vitro to generate recombinant DNA via SLIC , 2007, Nature Methods.

[77]  Susumu Goto,et al.  Identification of Enzyme Genes Using Chemical Structure Alignments of Substrate-Product Pairs , 2016, J. Chem. Inf. Model..

[78]  Roger L. Chang,et al.  Network Context and Selection in the Evolution to Enzyme Specificity , 2012, Science.

[79]  Robert D. Finn,et al.  HMMER web server: 2018 update , 2018, Nucleic Acids Res..

[80]  Jens Nielsen,et al.  Evolutionary programming as a platform for in silico metabolic engineering , 2005, BMC Bioinformatics.

[81]  J. Nielsen,et al.  Opportunities for yeast metabolic engineering: Lessons from synthetic biology , 2011, Biotechnology journal.

[82]  Thomas L. Madden,et al.  Domain enhanced lookup time accelerated BLAST , 2012, Biology Direct.

[83]  Xin Gao,et al.  MRE: a web tool to suggest foreign enzymes for the biosynthesis pathway design with competing endogenous reactions in mind , 2016, Nucleic Acids Res..

[84]  Pablo Carbonell,et al.  XTMS: pathway design in an eXTended metabolic space , 2014, Nucleic Acids Res..

[85]  Christoph Steinbeck,et al.  Rhea—a manually curated resource of biochemical reactions , 2011, Nucleic Acids Res..

[86]  Nathan J Hillson,et al.  DeviceEditor visual biological CAD canvas , 2012, Journal of Biological Engineering.

[87]  Alfonso Jaramillo,et al.  DESHARKY: automatic design of metabolic pathways for optimal cell growth , 2008, Bioinform..

[88]  Rainer Schrader,et al.  Metabolic pathway analysis web service (Pathway Hunter Tool at CUBIC) , 2005, Bioinform..

[89]  Philip Miller,et al.  BiGG Models: A platform for integrating, standardizing and sharing genome-scale models , 2015, Nucleic Acids Res..

[90]  Carola Engler,et al.  Golden Gate Shuffling: A One-Pot DNA Shuffling Method Based on Type IIs Restriction Enzymes , 2009, PloS one.

[91]  Michael A Fischbach,et al.  Computational approaches to natural product discovery. , 2015, Nature chemical biology.

[92]  Michael Darsow,et al.  ChEBI: a database and ontology for chemical entities of biological interest , 2007, Nucleic Acids Res..