Rapid, Heuristic Discovery and Design of Promoter Collections in Non-Model Microbes for Industrial Applications.

Well-characterized promoter collections for synthetic biology applications are not always available in industrially relevant hosts. We developed a broadly applicable method for promoter identification in atypical microbial hosts that requires no a priori understanding of cis-regulatory element structure. This novel approach combines bioinformatic filtering with rapid empirical characterization to expand the promoter toolkit and uses machine learning to improve the understanding of the relationship between DNA sequence and function. Here, we apply the method in Geobacillus thermoglucosidasius, a thermophilic organism with high potential as a synthetic biology chassis for industrial applications. Bioinformatic screening of G. kaustophilus, G. stearothermophilus, G. thermodenitrificans, and G. thermoglucosidasius resulted in the identification of 636 100 bp putative promoters, encompassing the genome-wide design space and lacking known transcription factor binding sites. Eighty of these sequences were characterized in vivo, and activities covered a 2-log range of predictable expression levels. Seven sequences were shown to function consistently regardless of the downstream coding sequence. Partition modeling identified sequence positions upstream of the canonical -35 and -10 consensus motifs that were predicted to strongly influence regulatory activity in Geobacillus, and artificial neural network and partial least squares regression models were derived to assess if there were a simple, forward, quantitative method for in silico prediction of promoter function. However, the models were insufficiently general to predict pre hoc promoter activity in vivo, most probably as a result of the relatively small size of the training data set compared to the size of the modeled design space.

[1]  A. Pühler,et al.  A Broad Host Range Mobilization System for In Vivo Genetic Engineering: Transposon Mutagenesis in Gram Negative Bacteria , 1983, Bio/Technology.

[2]  G. Church,et al.  Large-scale de novo DNA synthesis: technologies and applications , 2014, Nature Methods.

[3]  Cheemeng Tan,et al.  Reproducibility of High-Throughput Plate-Reader Experiments in Synthetic Biology. , 2017, ACS synthetic biology.

[4]  S. Kirchmaier,et al.  Golden GATEway Cloning – A Combinatorial Approach to Generate Fusion and Recombination Constructs , 2013, PloS one.

[5]  Matthias G. Steiger,et al.  Methanol regulated yeast promoters: production vehicles and toolbox for synthetic biology , 2015, Microbial Cell Factories.

[6]  Mikhail V Ovanesov,et al.  Correction of microplate location effects improves performance of the thrombin generation test , 2013, Thrombosis Journal.

[7]  D. Mead,et al.  Genomic analysis of six new Geobacillus strains reveals highly conserved carbohydrate degradation architectures and strategies , 2015, Front. Microbiol..

[8]  Drew Endy,et al.  Quantitative estimation of activity and quality for collections of functional genetic elements , 2013, Nature Methods.

[9]  D. Leak,et al.  The Geobacillus Plasmid Set: A Modular Toolkit for Thermophile Engineering. , 2016, ACS synthetic biology.

[10]  Christopher A. Voigt,et al.  Ribozyme-based insulator parts buffer synthetic circuits from genetic context , 2012, Nature Biotechnology.

[11]  Adam P. Arkin,et al.  FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix , 2009, Molecular biology and evolution.

[12]  Christopher A. Voigt,et al.  Engineered promoters enable constant gene expression at any copy number in bacteria , 2018, Nature Biotechnology.

[13]  Dewei Li,et al.  Survey and experimental study on metric learning methods , 2018, Neural Networks.

[14]  Pamela A Silver,et al.  Parts plus pipes: synthetic biology approaches to metabolic engineering. , 2012, Metabolic engineering.

[15]  S. Aves,et al.  Alkane Biosynthesis in Bacteria , 2019, Biogenesis of Hydrocarbons.

[16]  D. Leak,et al.  Application of pheB as a Reporter Gene for Geobacillus spp., Enabling Qualitative Colony Screening and Quantitative Analysis of Promoter Strength , 2012, Applied and Environmental Microbiology.

[17]  Kim Rutherford,et al.  Artemis: sequence visualization and annotation , 2000, Bioinform..

[18]  Jo Maertens,et al.  Construction and model-based analysis of a promoter library for E. coli: an indispensable tool for metabolic engineering , 2007, BMC biotechnology.

[19]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[20]  C. V. Rao,et al.  Evolutionary engineering of Geobacillus thermoglucosidasius for improved ethanol production , 2016, Biotechnology and bioengineering.

[21]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[22]  Alex Toftgaard Nielsen,et al.  Genetic toolbox for controlled expression of functional proteins in Geobacillus spp. , 2017, PloS one.

[23]  D. Leak,et al.  Metabolic engineering of Geobacillus thermoglucosidasius for high yield ethanol production. , 2009, Metabolic engineering.

[24]  S Wold,et al.  Quantitative sequence-activity models (QSAM)--tools for sequence design. , 1993, Nucleic acids research.

[25]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[26]  Yunxin Zhang,et al.  Relationship between promoter sequence and its strength in gene expression , 2014, The European physical journal. E, Soft matter.

[27]  Frances H Arnold,et al.  Isobutanol production at elevated temperatures in thermophilic Geobacillus thermoglucosidasius. , 2014, Metabolic engineering.

[28]  Hal S. Alper,et al.  Promoter engineering: Recent advances in controlling transcription at the most fundamental level , 2013, Biotechnology journal.

[29]  D. Endy,et al.  Refinement and standardization of synthetic biological parts and devices , 2008, Nature Biotechnology.

[30]  Adam Paul Arkin,et al.  Evaluation of 244,000 synthetic sequences reveals design principles to optimize translation in Escherichia coli , 2018, Nature Biotechnology.

[31]  Araceli M. Huerta,et al.  Genome-Wide Identification of Transcription Start Sites, Promoters and Transcription Factor Binding Sites in E. coli , 2009, PloS one.

[32]  John Love,et al.  Synthetic promoter design for new microbial chassis , 2016, Biochemical Society transactions.

[33]  Torsten Seemann,et al.  Prokka: rapid prokaryotic genome annotation , 2014, Bioinform..

[34]  Bryn L Adams,et al.  The Next Generation of Synthetic Biology Chassis: Moving Synthetic Biology from the Laboratory to the Field. , 2016, ACS synthetic biology.

[35]  Adam P Arkin,et al.  RNA processing enables predictable programming of gene expression , 2012, Nature Biotechnology.

[36]  D. Studholme Some (bacilli) like it hot: genomics of Geobacillus species , 2014, Microbial biotechnology.

[37]  Irene M. Brockman,et al.  Dynamic metabolic engineering: New strategies for developing responsive cell factories , 2015, Biotechnology journal.

[38]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[39]  Hongwu Ma,et al.  Model-based reconstruction of synthetic promoter library in Corynebacterium glutamicum , 2018, Biotechnology Letters.

[40]  Michaela A. Teravest,et al.  Tuning promoter strengths for improved synthesis and function of electron conduits in Escherichia coli. , 2013, ACS synthetic biology.

[41]  Joseph H. Davis,et al.  Design, construction and characterization of a set of insulated bacterial promoters , 2010, Nucleic acids research.

[42]  K. Hammer,et al.  The Sequence of Spacers between the Consensus Sequences Modulates the Strength of Prokaryotic Promoters , 1998, Applied and Environmental Microbiology.

[43]  Lee R Lynd,et al.  Recent progress in consolidated bioprocessing. , 2012, Current opinion in biotechnology.

[44]  Christopher A. Voigt,et al.  Automated design of synthetic ribosome binding sites to control protein expression , 2016 .

[45]  V. Solovyev,et al.  Automatic Annotation of Microbial Genomes and Metagenomic Sequences 3 MATERIAL AND METHODS Learning Parameters and Prediction of Protein-Coding Genes , 2013 .

[46]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[47]  G. Stephanopoulos,et al.  Tuning genetic control through promoter engineering. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[48]  R. Gourse,et al.  Identification of an UP element consensus sequence for bacterial promoters. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[49]  B. Liu,et al.  Genomic and proteomic characterization of a thermophilic Geobacillus bacteriophage GBSV1. , 2009, Research in microbiology.

[50]  Lei Wang,et al.  Genome and proteome of long-chain alkane degrading Geobacillus thermodenitrificans NG80-2 isolated from a deep-subsurface oil reservoir , 2007, Proceedings of the National Academy of Sciences.

[51]  Ken-ichi Yoshida,et al.  Counterselection System for Geobacillus kaustophilus HTA426 through Disruption of pyrF and pyrR , 2012, Applied and Environmental Microbiology.

[52]  Eric J Alm,et al.  Metagenomic mining of regulatory elements enables programmable species-selective gene expression , 2018, Nature Methods.

[53]  I. Matsumura,et al.  Transformable facultative thermophile Geobacillus stearothermophilus NUB3621 as a host strain for metabolic engineering , 2014, Applied Microbiology and Biotechnology.

[54]  R. Gourse,et al.  A third recognition element in bacterial promoters: DNA binding by the alpha subunit of RNA polymerase. , 1993, Science.

[55]  Haitao Yin,et al.  Learning category distance metric for data clustering , 2018, Neurocomputing.

[56]  S. Redl,et al.  Application of the thermostable β-galactosidase, BgaB, from Geobacillus stearothermophilus as a versatile reporter under anaerobic and aerobic conditions , 2017, AMB Express.

[57]  D. Čitavičius,et al.  Genetic engineering of Geobacillus spp. , 2015, Journal of microbiological methods.

[58]  J. Heap,et al.  Stringency of synthetic promoter sequences in Clostridium revealed and circumvented by tuning promoter library mutation rates , 2017, bioRxiv.

[59]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[60]  William R. Henson,et al.  Molecular Toolkit for Gene Expression Control and Genome Modification in Rhodococcus opacus PD630. , 2018, ACS synthetic biology.

[61]  Min Zhang,et al.  State of the art review of biofuels production from lignocellulose by thermophilic bacteria. , 2017, Bioresource technology.

[62]  Christopher A. Voigt,et al.  Genetic circuit design automation , 2016, Science.

[63]  Carola Engler,et al.  A One Pot, One Step, Precision Cloning Method with High Throughput Capability , 2008, PloS one.

[64]  David J. Leak,et al.  Modular system for assessment of glycosyl hydrolase secretion in Geobacillus thermoglucosidasius. , 2013, Microbiology.

[65]  R. Tsien,et al.  Improved monomeric red, orange and yellow fluorescent proteins derived from Discosoma sp. red fluorescent protein , 2004, Nature Biotechnology.

[66]  Drew Endy,et al.  Precise and reliable gene expression via standard transcription and translation initiation elements , 2013, Nature Methods.

[67]  J. Shine,et al.  The 3'-terminal sequence of Escherichia coli 16S ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites. , 1974, Proceedings of the National Academy of Sciences of the United States of America.

[68]  Vivek K. Mutalik,et al.  Composability of regulatory sequences controlling transcription and translation in Escherichia coli , 2013, Proceedings of the National Academy of Sciences.

[69]  F. Nano,et al.  Synthetic Promoters Functional in Francisella novicida and Escherichia coli , 2013, Applied and Environmental Microbiology.

[70]  Yong Wang,et al.  Quantitative Design of Regulatory Elements Based on High-Precision Strength Prediction Using Artificial Neural Network , 2013, PloS one.

[71]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..