A multi-scale transcriptional regulatory network knowledge base for Escherichia coli

Transcriptomic data is accumulating rapidly; thus, development of scalable methods for extracting knowledge from this data is critical. We assembled a top-down transcriptional regulatory network for Escherichia coli from a 1035-sample, single-protocol, high-quality RNA-seq compendium. The compendium contains diverse growth conditions, including: 4 temperatures; 9 media; 39 supplements, including antibiotics; and 76 unique gene knockouts. Using unsupervised machine learning, we extracted 117 regulatory modules that account for 86% of known regulatory network interactions. We also identified two novel regulons. After expanding the compendium with 1675 publicly available samples, we extracted similar modules, highlighting the method’s scalability and stability. We provide workflows to enable analysis of new user data against this knowledge base, and demonstrate its utility for experimental design. This work provides a blueprint for top-down regulatory network elucidation across organisms using existing data, without any prior annotation and using existing data. Highlights - Single protocol, high quality RNA-seq dataset contains 1035 samples from Escherichia coli covering a wide range of growth conditions - Machine learning identifies 117 regulatory modules that capture the majority of known regulatory interactions - Resulting knowledge base combines expression levels and module activities to enable regulon discovery and empower novel experimental design - Standard workflows provided to enable application of knowledge base to new user data Graphical Abstract

[1]  Adam M. Feist,et al.  Experimental Evolution Reveals Unifying Systems-Level Adaptations but Diversity in Driving Genotypes , 2022, mSystems.

[2]  Adam M. Feist,et al.  Machine-learning from Pseudomonas putida KT2440 transcriptomes reveals its transcriptional regulatory network. , 2022, Metabolic engineering.

[3]  Adam M. Feist,et al.  Laboratory evolution of synthetic electron transport system variants reveals a larger metabolic respiratory system and its plasticity , 2022, Nature Communications.

[4]  Daniel C. Zielinski,et al.  Quantitative sequence basis for the E. coli transcriptional regulatory network , 2022, bioRxiv.

[5]  Anand V. Sastry,et al.  Identification of a transcription factor, PunR, that regulates the purine and purine nucleoside transporter punC in E. coli , 2021, Communications Biology.

[6]  Anand V. Sastry,et al.  Machine Learning Uncovers a Data-Driven Transcriptional Regulatory Network for the Crenarchaeal Thermoacidophile Sulfolobus acidocaldarius , 2021, bioRxiv.

[7]  Anand V. Sastry,et al.  Machine Learning of Pseudomonas aeruginosa transcriptomes identifies independently modulated sets of genes associated with known transcriptional regulators , 2021, bioRxiv.

[8]  Anand V. Sastry,et al.  Mining all publicly available expression data to compute dynamic microbial transcriptional regulatory networks , 2021, bioRxiv.

[9]  Anand V. Sastry,et al.  Machine Learning of All Mycobacterium tuberculosis H37Rv RNA-seq Data Reveals a Structured Interplay between Metabolism, Stress Response, and Infection , 2021, bioRxiv.

[10]  Anand V. Sastry,et al.  Optimal dimensionality selection for independent component analysis of transcriptomic data , 2021, BMC Bioinformatics.

[11]  David R. Kelley,et al.  Effective gene expression prediction from sequence by integrating long-range interactions , 2021, Nature Methods.

[12]  Erol S. Kavvas,et al.  Independent component analysis recovers consistent regulatory signals from disparate datasets , 2021, PLoS Comput. Biol..

[13]  Connor A. Olson,et al.  Bacterial fitness landscapes stratify based on proteome allocation associated with discrete aero-types , 2021, PLoS Comput. Biol..

[14]  Bernhard O Palsson,et al.  iModulonDB: a knowledgebase of microbial transcriptional regulation derived from machine learning , 2020, bioRxiv.

[15]  Anand V. Sastry,et al.  Independent component analysis of E. coli's transcriptome reveals the cellular processes that respond to heterologous gene expression. , 2020, Metabolic engineering.

[16]  Troy E. Sandberg,et al.  Synthetic Cross-Phyla Gene Replacement and Evolutionary Assimilation of Major Enzymes , 2020, Nature Ecology & Evolution.

[17]  Anand V. Sastry,et al.  Elucidation of Regulatory Modes for Five Two-Component Systems in Escherichia coli Reveals Novel Relationships , 2020, mSystems.

[18]  Adam M. Feist,et al.  Decomposition of transcriptional responses provides insights into differential antibiotic susceptibility , 2020, bioRxiv.

[19]  Anand V. Sastry,et al.  Synthesis of the novel transporter YdhC, is regulated by the YdhB transcription factor controlling adenosine and adenine uptake , 2020, bioRxiv.

[20]  Jonathan M. Monk,et al.  PtrR (YneJ) is a novel E. coli transcription factor regulating the putrescine stress response and glutamate utilization , 2020, bioRxiv.

[21]  Anand V. Sastry,et al.  Machine learning uncovers independently regulated modules in the Bacillus subtilis transcriptome , 2020, Nature Communications.

[22]  Hyun Uk Kim,et al.  Modeling regulatory networks using machine learning for systems metabolic engineering. , 2020, Current opinion in biotechnology.

[23]  K. Selvarajoo,et al.  Attractor Concepts to Evaluate the Transcriptome-wide Dynamics Guiding Anaerobic to Aerobic State Transition in Escherichia coli , 2020, Scientific Reports.

[24]  Anand V. Sastry,et al.  Revealing 29 sets of independently modulated genes in Staphylococcus aureus, their regulators, and role in key physiological response , 2020, Proceedings of the National Academy of Sciences.

[25]  Giovanni Parmigiani,et al.  ComBat-seq: batch effect adjustment for RNA-seq count data , 2020, bioRxiv.

[26]  Bernhard O. Palsson,et al.  BiGG Models 2020: multi-strain genome-scale models and expansion across the phylogenetic tree , 2019, Nucleic Acids Res..

[27]  V. Verendel,et al.  Deep learning suggests that gene expression is encoded in all parts of a co-evolving interacting gene regulatory structure , 2019, Nature Communications.

[28]  Adam M. Feist,et al.  Kinetic profiling of metabolic specialists demonstrates stability and consistency of in vivo enzyme turnover numbers , 2019, Proceedings of the National Academy of Sciences.

[29]  Adam M. Feist,et al.  Adaptive evolution reveals a tradeoff between growth rate and oxidative stress during naphthoquinone-based aerobic respiration , 2019, Proceedings of the National Academy of Sciences.

[30]  Richard Szubin,et al.  OxyR Is a Convergent Target for Mutations Acquired during Adaptation to Oxidative Stress-Prone Metabolic States , 2019, Molecular biology and evolution.

[31]  Adam M. Feist,et al.  Adaptive laboratory evolution of Escherichia coli under acid stress , 2019, bioRxiv.

[32]  Zachary A. King,et al.  The Escherichia coli transcriptome mostly consists of independently regulated modules , 2019, Nature Communications.

[33]  Mark Ziemann,et al.  Digital expression explorer 2: a repository of uniformly processed RNA sequencing data , 2019, GigaScience.

[34]  Yingnian Wu,et al.  Deep-learning augmented RNA-seq analysis of transcript splicing , 2019, Nature Methods.

[35]  Zachary A. King,et al.  The y-ome defines the 35% of Escherichia coli genes that lack experimental evidence of function , 2019, Nucleic acids research.

[36]  Julio Collado-Vides,et al.  RegulonDB v 10.5: tackling challenges to unify classic and high throughput knowledge of gene regulation in E. coli K-12 , 2018, Nucleic Acids Res..

[37]  Daniel C. Zielinski,et al.  Machine learning applied to enzyme turnover numbers reveals protein structural correlates and improves metabolic models , 2018, Nature Communications.

[38]  Adam M. Feist,et al.  Evolution of gene knockout strains of E. coli reveal regulatory architectures governed by metabolism , 2018, Nature Communications.

[39]  James T. Yurkovich,et al.  Systematic discovery of uncharacterized transcription factors in Escherichia coli K-12 MG1655 , 2018, bioRxiv.

[40]  Y. Saeys,et al.  A comprehensive evaluation of module detection methods for gene expression data , 2018, Nature Communications.

[41]  David P. Leader,et al.  FlyAtlas 2: a new version of the Drosophila melanogaster expression atlas with RNA-Seq, miRNA-Seq and sex-specific data , 2017, Nucleic Acids Res..

[42]  Cory Y. McLean,et al.  Sequential regulatory activity prediction across chromosomes with convolutional neural networks , 2017, bioRxiv.

[43]  Paolo Di Tommaso,et al.  Nextflow enables reproducible computational workflows , 2017, Nature Biotechnology.

[44]  Måns Magnusson,et al.  MultiQC: summarize analysis results for multiple tools and samples in a single report , 2016, Bioinform..

[45]  M. Markatou,et al.  Evaluation of Methods in Removing Batch Effects on RNA-seq Data , 2016 .

[46]  R. Aebersold,et al.  The quantitative and condition-dependent Escherichia coli proteome , 2015, Nature Biotechnology.

[47]  G. Kempermann Faculty Opinions recommendation of Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. , 2015 .

[48]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[49]  Wei Shi,et al.  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features , 2013, Bioinform..

[50]  Naotake Ogasawara,et al.  Genetic manipulations restored the growth fitness of reduced-genome Escherichia coli. , 2013, Journal of bioscience and bioengineering.

[51]  Guy Cochrane,et al.  The International Nucleotide Sequence Database Collaboration , 2012, Nucleic acids research.

[52]  Wei Li,et al.  RSeQC: quality control of RNA-seq experiments , 2012, Bioinform..

[53]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[54]  Hideaki Sugawara,et al.  The Sequence Read Archive , 2010, Nucleic Acids Res..

[55]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[56]  D. Swigon,et al.  Catabolite activator protein: DNA binding and transcription activation. , 2004, Current opinion in structural biology.

[57]  S. Casjens,et al.  Analysis of the lambdoid prophage element e14 in the E. coli K-12 genome , 2004, BMC Microbiology.

[58]  James J. Valdes,et al.  DNA Microarray-Based Identification of Genes Controlled by Autoinducer 2-Stimulated Quorum Sensing inEscherichia coli , 2001, Journal of bacteriology.

[59]  L. Reitzer,et al.  Metabolic Context and Possible Physiological Themes of ς54-Dependent Genes in Escherichia coli , 2001, Microbiology and Molecular Biology Reviews.

[60]  R. Ebright,et al.  Transcription activation by catabolite activator protein (CAP). , 1999, Journal of molecular biology.

[61]  D. Touati,et al.  Lethal oxidative damage and mutagenesis are generated by iron in delta fur mutants of Escherichia coli: protective role of superoxide dismutase , 1995, Journal of bacteriology.

[62]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..