Organizing genome engineering for the gigabase scale

Genome-scale engineering holds great potential to impact science, industry, medicine, and society, and recent improvements in DNA synthesis have enabled the manipulation of megabase genomes. However, coordinating and integrating the workflows and large teams necessary for gigabase genome engineering remains a considerable challenge. We examine this issue and recommend a path forward by: 1) adopting and extending existing representations for designs, assembly plans, samples, data, and workflows; 2) developing new technologies for data curation and quality control; 3) conducting fundamental research on genome-scale modeling and design; and 4) developing new legal and contractual infrastructure to facilitate collaboration. Genome-scale engineering requires the integration of a wide range of in silico and in vivo technologies, as well data management procedures and legal infrastructure. Here the authors provide a list of recommendations to address these challenges.

[1]  Thomas H Segall-Shapiro,et al.  Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome , 2010, Science.

[2]  Sharon M Crook,et al.  Model exchange with the NeuroML model database , 2014, BMC Neuroscience.

[3]  Yizhi Cai,et al.  Design of a synthetic yeast genome , 2017, Science.

[4]  Andrew D Ellington,et al.  Synthetic DNA Synthesis and Assembly: Putting the Synthetic in Synthetic Biology. , 2017, Cold Spring Harbor perspectives in biology.

[5]  Priscilla E. M. Purnick,et al.  The second wave of synthetic biology: from modules to systems , 2009, Nature Reviews Molecular Cell Biology.

[6]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[7]  Peter Williams,et al.  IMG: the integrated microbial genomes database and comparative analysis system , 2011, Nucleic Acids Res..

[8]  G. Stephanopoulos,et al.  Metabolic engineering: past and future. , 2013, Annual review of chemical and biomolecular engineering.

[9]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[10]  Roland Eils,et al.  BioModels: expanding horizons to include more modelling approaches and formats , 2017, Nucleic Acids Res..

[11]  Martyn Amos,et al.  An implementation-focussed bio/algorithmic workflow for 1 synthetic biology 2 , 2016 .

[12]  J M Fisher,et al.  Utilizing Atlassian Jira For Large-Scale Software Development Management* , 2013 .

[13]  Thomas Thorne,et al.  Model selection in systems and synthetic biology. , 2013, Current opinion in biotechnology.

[14]  Harris H. Wang,et al.  Genome-scale engineering for systems and synthetic biology , 2013, Molecular systems biology.

[15]  Jean-Baptiste Cazier,et al.  Choice of transcripts and software has a large effect on variant annotation , 2014, Genome Medicine.

[16]  Alyssa M. Redding,et al.  Metabolic engineering of Saccharomyces cerevisiae for the production of n-butanol , 2008, Microbial cell factories.

[17]  Poonam J. Prasad,et al.  Trends in laboratory information management system , 2012 .

[18]  Goksel Misirli,et al.  Standard virtual biological parts: a repository of modular modeling components for synthetic biology , 2010, Bioinform..

[19]  Sylvestre Marillonnet,et al.  Fast track assembly of multigene constructs using Golden Gate cloning and the MoClo system. , 2012, Bioengineered bugs.

[20]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[21]  Zhen Zhang,et al.  Generating Systems Biology Markup Language Models from the Synthetic Biology Open Language. , 2015, ACS synthetic biology.

[22]  Michel Dumontier,et al.  Controlled vocabularies and semantics in systems biology , 2011, Molecular systems biology.

[23]  Larry L. Peterson,et al.  Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors , 2007, EuroSys '07.

[24]  Zhen Zhang,et al.  Synthetic Biology Open Language (SBOL) Version 2.2.0 , 2018, J. Integr. Bioinform..

[25]  F. Arnaud,et al.  From core referencing to data re-use: two French national initiatives to reinforce paleodata stewardship (National Cyber Core Repository and LTER France Retro-Observatory) , 2017 .

[26]  Pamela A. Silver,et al.  Large-scale recoding of a bacterial genome by iterative recombineering of synthetic DNA , 2017, Nucleic acids research.

[27]  Darren A. Natale,et al.  BpForms and BcForms: Tools for concretely describing non-canonical polymers and complexes to facilitate comprehensive biochemical networks , 2019 .

[28]  D. G. Gibson,et al.  Enzymatic assembly of DNA molecules up to several hundred kilobases , 2009, Nature Methods.

[29]  Peter J. Hunter,et al.  Bioinformatics Applications Note Databases and Ontologies the Physiome Model Repository 2 , 2022 .

[30]  J Craig Venter,et al.  One-step assembly in yeast of 25 overlapping DNA fragments to form a complete synthetic Mycoplasma genitalium genome , 2008, Proceedings of the National Academy of Sciences.

[31]  Carole Goble,et al.  An automated Design-Build-Test-Learn pipeline for enhanced microbial production of fine chemicals , 2018, Communications Biology.

[32]  Ernst Weber,et al.  A Modular Cloning System for Standardized Assembly of Multigene Constructs , 2011, PloS one.

[33]  James Cheney,et al.  The W3C PROV family of specifications for modelling provenance metadata , 2013, EDBT '13.

[34]  Jonathan R. Karr,et al.  A Whole-Cell Computational Model Predicts Phenotype from Genotype , 2012, Cell.

[35]  Jonathan R. Karr,et al.  Emerging whole-cell modeling principles and methods. , 2017, Current opinion in biotechnology.

[36]  Michael L. Hines,et al.  Open Source Brain: A Collaborative Resource for Visualizing, Analyzing, Simulating, and Developing Standardized Models of Neurons and Circuits , 2018, Neuron.

[37]  Judy Qiu,et al.  Total Synthesis of a Functional Designer Eukaryotic Chromosome , 2014, Science.

[38]  G. Church,et al.  Large-scale de novo DNA synthesis: technologies and applications , 2014, Nature Methods.

[39]  Alan Villalobos,et al.  Gene Designer: a synthetic biology tool for constructing artificial DNA segments , 2006, BMC Bioinformatics.

[40]  George M. Church,et al.  Beyond editing to writing large genomes , 2017, Nature Reviews Genetics.

[41]  Toshihisa Takagi,et al.  DNA data bank of Japan (DDBJ) progress report , 2015, Nucleic Acids Res..

[42]  Pedro Mendes,et al.  ModelBricks—modules for reproducible modeling improving model annotation and provenance , 2019, npj Systems Biology and Applications.

[43]  Carole A. Goble,et al.  myExperiment: a repository and social network for the sharing of bioinformatics workflows , 2010, Nucleic Acids Res..

[44]  Thinh Nguyen Science Commons: Material Transfer Agreement Project , 2007, Innovations: Technology, Governance, Globalization.

[45]  Jacob Beal,et al.  Technological challenges and milestones for writing genomes , 2019, Science.

[46]  W Mandecki,et al.  A totally synthetic plasmid for general cloning, gene expression and mutagenesis in Escherichia coli. , 1990, Gene.

[47]  Hiroaki Kitano,et al.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models , 2003, Bioinform..

[48]  David M. Miller,et al.  A knowledge-based approach to design for manufacturability , 1993, J. Intell. Manuf..

[49]  Ben Miles,et al.  Achieving Reproducibility and Closed-Loop Automation in Biological Experimentation with an IoT-Enabled Lab of the Future , 2018, SLAS technology.

[50]  Nathan J Hillson,et al.  The Experiment Data Depot: A Web-Based Software Tool for Biological Experimental Data Storage, Sharing, and Visualization. , 2017, ACS synthetic biology.

[51]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[52]  Marc Salit,et al.  External RNA Controls Consortium Beta Version Update , 2016, Journal of genomics.

[53]  Tom Ellis,et al.  DNA assembly for synthetic biology: from parts to pathways and beyond. , 2011, Integrative biology : quantitative biosciences from nano to macro.

[54]  Binil Starly,et al.  Bio-CAD modeling and its applications in computer-aided tissue engineering , 2005, Comput. Aided Des..

[55]  Jun Cheng,et al.  The Kipoi repository accelerates community exchange and reuse of predictive models for genomics , 2019, Nature Biotechnology.

[56]  Fangfang Xia,et al.  The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST) , 2013, Nucleic Acids Res..

[57]  Jay D Keasling,et al.  Narrowing the gap between the promise and reality of polyketide synthases as a synthetic biology platform. , 2014, Current opinion in biotechnology.

[58]  Peter S. Swain,et al.  General calibration of microbial growth in microplate readers , 2016, Scientific Reports.

[59]  Joel S. Bader,et al.  Synthetic chromosome arms function in yeast and generate phenotypic diversity by design , 2011, Nature.

[60]  S. Lee,et al.  Metabolic flux analysis and metabolic engineering of microorganisms. , 2008, Molecular bioSystems.

[61]  Mary Goldman,et al.  Toil enables reproducible, open source, big biomedical data analyses , 2017, Nature Biotechnology.

[62]  Julius Fredens,et al.  Total synthesis of Escherichia coli with a recoded genome , 2019, Nature.

[63]  Anna Zhukova,et al.  Modeling sample variables with an Experimental Factor Ontology , 2010, Bioinform..

[64]  Adam M. Feist,et al.  Next-generation genome-scale models for metabolic engineering. , 2015, Current opinion in biotechnology.

[65]  Tudor Groza,et al.  The Human Phenotype Ontology in 2017 , 2016, Nucleic Acids Res..

[66]  Padraig Gleeson,et al.  Development of NeuroML version 2.0: greater extensibility, support for abstract neuronal models and interaction with Systems Biology languages , 2011, BMC Neuroscience.

[67]  Zhen Zhang,et al.  Sharing Structure and Function in Biological Design with SBOL 2.0. , 2016, ACS synthetic biology.

[68]  Jonathan R. Karr,et al.  A blueprint for human whole-cell modeling. , 2018, Current opinion in systems biology.

[69]  Guy Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2011, Nucleic Acids Res..

[70]  Nuno A. Fonseca,et al.  Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction , 2015, BMC Genomics.

[71]  Bernd Rinn,et al.  FAIRDOMHub: a repository and collaboration environment for sharing systems biology research , 2016, Nucleic Acids Res..

[72]  Melinda R. Dwinell,et al.  Three Ontologies to Define Phenotype Measurement Data , 2012, Front. Gene..

[73]  Lawrence Lessig,et al.  The Creative Commons , 2004 .

[74]  Martin Ester,et al.  PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes , 2010, Bioinform..

[75]  Lili Wang,et al.  Standardization, Calibration, and Control in Flow Cytometry , 2017, Current protocols in cytometry.

[76]  M. Sadowski,et al.  Harnessing QbD, Programming Languages, and Automation for Reproducible Biology. , 2016, Trends in biotechnology.

[77]  Sarah J Kodumal,et al.  Total synthesis of long DNA sequences: synthesis of a contiguous 32-kb polyketide synthase gene cluster. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[78]  Allan Kuchinsky,et al.  The Synthetic Biology Open Language (SBOL) provides a community standard for communicating designs in synthetic biology , 2014, Nature Biotechnology.

[79]  Carole A. Goble,et al.  RightField: embedding ontology annotation in spreadsheets , 2011, Bioinform..

[80]  George M. Church,et al.  Design, synthesis, and testing toward a 57-codon genome , 2016, Science.

[81]  Michael R. Crusoe,et al.  Common Workflow Language , 2015 .

[82]  Yuguang Fang,et al.  Cross-Domain Data Sharing in Distributed Electronic Health Record Systems , 2010, IEEE Transactions on Parallel and Distributed Systems.

[83]  Jonathan R. Karr,et al.  BpForms: a toolkit for concretely describing modified DNA, RNA and proteins , 2019 .

[84]  James Alastair McLaughlin,et al.  SynBioHub: A Standards-Enabled Design Repository for Synthetic Biology. , 2018, ACS synthetic biology.

[85]  Carola Engler,et al.  Golden Gate Shuffling: A One-Pot DNA Shuffling Method Based on Type IIs Restriction Enzymes , 2009, PloS one.

[86]  T. Vanhercke,et al.  From plant metabolic engineering to plant synthetic biology: The evolution of the design/build/test/learn cycle. , 2018, Plant science : an international journal of experimental plant biology.

[87]  Timothy S. Ham,et al.  Design, implementation and practice of JBEI-ICE: an open source biological part registry platform and tools , 2012, Nucleic acids research.

[88]  Nicolas Le Novère,et al.  COMBINE archive and OMEX format: one file to share all information to reproduce a modeling project , 2014, BMC Bioinformatics.

[89]  D. G. Gibson,et al.  Design and synthesis of a minimal bacterial genome , 2016, Science.

[90]  Peter Uetz,et al.  An ontology for microbial phenotypes , 2014, BMC Microbiology.

[91]  Drew Endy,et al.  Opening options for material transfer , 2018, Nature Biotechnology.

[92]  PetersonLarry,et al.  Container-based operating system virtualization , 2007 .

[93]  Peter J. Hunter,et al.  The CellML 1.1 Specification , 2015, J. Integr. Bioinform..

[94]  Adam P. Arkin,et al.  The Genome Project-Write , 2016, Science.

[95]  Ernst Oberortner,et al.  Streamlining the Design-to-Build Transition with Build-Optimization Software Tools. , 2017, ACS synthetic biology.

[96]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[97]  W. Stemmer,et al.  Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides. , 1995, Gene.

[98]  João Varajão,et al.  Rule ontology for automatic design verification application to PCB manufacturing and assembly , 2017, IECON 2017 - 43rd Annual Conference of the IEEE Industrial Electronics Society.

[99]  Suzanne M. Paley,et al.  The BioCyc collection of microbial genomes and metabolic pathways , 2019, Briefings Bioinform..

[100]  Eric Klavins,et al.  Synthetic Bistability and Differentiation in Yeast. , 2019, ACS synthetic biology.

[101]  Paolo Di Tommaso,et al.  Nextflow enables reproducible computational workflows , 2017, Nature Biotechnology.

[102]  Robert Root-Bernstein,et al.  Biology Is Technology: The Promise, Peril, and New Business of Engineering Life , 2010 .

[103]  Astrid Gall,et al.  Ensembl 2019 , 2018, Nucleic Acids Res..

[104]  A. Paul,et al.  Chemical Synthesis of Poliovirus cDNA: Generation of Infectious Virus in the Absence of Natural Template , 2002, Science.

[105]  Guy Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2012, Nucleic Acids Res..

[106]  Aaron Clauset,et al.  Synthesis aided design: The biological design-build-test engineering paradigm? , 2016, Biotechnology and bioengineering.

[107]  Richard Gordon,et al.  OpenWorm: overview and recent advances in integrative biological simulation of Caenorhabditis elegans , 2018, Philosophical Transactions of the Royal Society B.

[108]  Jacob Beal,et al.  A standard-enabled workflow for synthetic biology. , 2017, Biochemical Society transactions.

[109]  Adam M. Feist,et al.  iML1515, a knowledgebase that computes Escherichia coli traits , 2017, Nature Biotechnology.

[110]  R. Durbin,et al.  The Sequence Ontology: a tool for the unification of genome annotations , 2005, Genome Biology.

[111]  Neil Swainston,et al.  Recon 2.2: from reconstruction to model of human metabolism , 2016, Metabolomics.

[112]  Elof Axel Carlson Biology is Technology: The Promise, Peril, and New Business of Engineering Life . By Robert H. Carlson . Cambridge (Massachusetts) : Harvard University Press. $39.95. vii + 279 p.; ill.; index. 978‐0‐674‐03544‐7 . 2010 . , 2010 .

[113]  H. G. Khorana,et al.  Total synthesis of a gene , 1979, Resonance.

[114]  Gary D. Bader,et al.  Promoting Coordinated Development of Community-Based Information Standards for Modeling in Biology: The COMBINE Initiative , 2015, Front. Bioeng. Biotechnol..

[115]  Curtis Madsen,et al.  Needs and opportunities in bio-design automation: four areas for focus. , 2017, Current opinion in chemical biology.

[116]  Dan M. Bolser,et al.  Ensembl Genomes 2016: more genomes, more complexity , 2015, Nucleic Acids Res..

[117]  Chris J Myers,et al.  A Converter from the Systems Biology Markup Language to the Synthetic Biology Open Language. , 2016, ACS synthetic biology.

[118]  Zhen Zhang,et al.  Synthetic Biology Open Language (SBOL) Version 2.1.0 , 2012, Journal of integrative bioinformatics.

[119]  G. Church,et al.  Accurate multiplex gene synthesis from programmable DNA microchips , 2004, Nature.

[120]  Wen J. Li,et al.  RefSeq: an update on prokaryotic genome annotation and curation , 2017, Nucleic Acids Res..

[121]  Christopher A. Voigt,et al.  Genetic circuit design automation , 2016, Science.

[122]  Karen Eilbeck,et al.  A standard variation file format for human genome sequences , 2010, Genome Biology.

[123]  Zhen Zhang,et al.  Synthetic Biology Open Language (SBOL) Version 2.0.0. , 2015, Journal of integrative bioinformatics.

[124]  Benedict Paten,et al.  The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows , 2017, F1000Research.

[125]  Timothy B. Stockwell,et al.  Complete Chemical Synthesis, Assembly, and Cloning of a Mycoplasma genitalium Genome , 2008, Science.

[126]  Robert A Hoffman,et al.  Standardization, Calibration, and Control in Flow Cytometry , 1997, Current protocols in cytometry.

[127]  Peter M. Rice,et al.  The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants , 2009, Nucleic acids research.

[128]  Jacob Beal,et al.  Quantification of bacterial fluorescence using independent calibrants , 2018, PloS one.

[129]  Jing Wang,et al.  WebGestalt 2019: gene set analysis toolkit with revamped UIs and APIs , 2019, Nucleic Acids Res..

[130]  Marcus C Chibucos,et al.  The Evidence and Conclusion Ontology (ECO): Supporting GO Annotations. , 2017, Methods in molecular biology.