Title: The Escherichia coli Transcriptome Consists of Independently Regulated Modules

Underlying cellular responses is a transcriptional regulatory network (TRN) that modulates gene expression. A useful description of the TRN would decompose the transcriptome into targeted effects of individual transcriptional regulators. Here, we applied unsupervised learning to a compendium of high-quality Escherichia coli RNA-seq datasets to identify 70 statistically independent signals that modulate the expression of specific gene sets. We show that 50 of these transcriptomic signals represent the effects of currently characterized transcriptional regulators. Condition-specific activation of signals was validated by exposure of E. coli to new environmental conditions. The resulting decomposition of the transcriptome provided: (1) a mechanistic, systems-level, network-based explanation of responses to environmental and genetic perturbations, (2) a guide to gene and regulator function discovery, and (3) a basis for characterizing transcriptomic differences in multiple strains. Taken together, our results show that signal summation forms an underlying principle that describes the composition of a model prokaryotic transcriptome. Main: The transcriptional regulatory network (TRN) senses and integrates complex environmental and intracellular information to coordinate gene expression of a cell. Reverse engineering the TRN informs how an organism responds to diverse stresses and unfamiliar environments. A fully characterized TRN would enable the prediction and mechanistic explanation of an organism’s dynamic adaptation to environmental or genetic perturbations. Reconstruction of a genome-scale TRN requires a substantial number of experiments to integrate the binding sites for each regulator and characterize their activities. Unlike eukaryotic TRNs, which contain highly-connected co-associations, prokaryotic TRNs exhibit a simpler structure; over 75% of genes in the model bacteria Escherichia coli are known targets of two or fewer TFs (Fig. S1a). The TRN structure is encoded in the genome as regulator binding sites and is invariant to environmental dynamics. However, environmental and genetic perturbations alter the activity states of transcriptional regulators to change their DNA binding affinity, which in turn modulates the transcriptome in a condition-specific manner. Thus, a measured expression profile reflects a combination of the activity of all transcriptional regulators under the examined condition, posing a fundamental deconvolution challenge. Compendia of expression profiles have been leveraged to infer TRNs by identifying shared patterns across gene expression profiles, rather than using direct DNA-TF binding information. Many inference methods define groups of genes, or modules, with similar expression profiles that are often functionally related or co-expressed. A recent review showed that independent component analysis (ICA), a signal deconvolution algorithm, outperformed most other module detection algorithms in identifying groups of coregulated genes. ICA is a blind source separation algorithm used to deconvolute mixed signals into their individual sources and determine their relative strengths. Prior application of ICA to microarray expression data has identified co-expressed, functionally-related gene sets that often map to metabolic pathways. The overall expression levels, or activities, of the gene sets have been leveraged to classify tumor samples and connect transcriptional modules to disease states. A current challenge for analyzing transcriptional regulation is to separate the conditioninvariant network structure from its condition-dependent expression state on a genome scale. Here, we overcome this limitation for the E. coli TRN by simultaneously extracting its structure and regulator activities from a transcriptomics compendium. This approach relies on: (1) the availability of high-quality, self-consistent, and condition-rich expression profiling datasets; (2) the use of ICA to concurrently identify regulator targets and activities; and (3) validation through the association of inferred regulator targets with observed molecular interactions. The elucidated TRN structure deconvolutes transcriptomic responses of E. coli into a summation of conditionspecific effects of individual transcriptional regulators.

[1]  Ajit Singh,et al.  Machine Learning With Python , 2019 .

[2]  Julio Collado-Vides,et al.  A unified resource for transcriptional regulation in Escherichia coli K-12 incorporating high-throughput-generated binding data into RegulonDB version 10.0 , 2018, BMC Biology.

[3]  James T. Yurkovich,et al.  Systematic discovery of uncharacterized transcription factors in Escherichia coli K-12 MG1655 , 2018, bioRxiv.

[4]  Zachary A. King,et al.  The y-ome defines the thirty-four percent of Escherichia coli genes that lack experimental evidence of function , 2018, bioRxiv.

[5]  Adam M. Feist,et al.  ALEdb 1.0: a database of mutations from adaptive laboratory evolution experimentation , 2018, bioRxiv.

[6]  Y. Saeys,et al.  A comprehensive evaluation of module detection methods for gene expression data , 2018, Nature Communications.

[7]  K. Jung,et al.  BtsT, a Novel and Specific Pyruvate/H+ Symporter in Escherichia coli , 2017, Journal of bacteriology.

[8]  James T. Yurkovich,et al.  Global transcriptional regulatory network for Escherichia coli robustly connects gene expression to transcription factor activities , 2017, Proceedings of the National Academy of Sciences.

[9]  K. Jung,et al.  Identification of a High-Affinity Pyruvate Receptor in Escherichia coli , 2017, Scientific Reports.

[10]  Peter D. Karp,et al.  The EcoCyc database: reflecting new knowledge about Escherichia coli K-12 , 2016, Nucleic Acids Res..

[11]  Ilias Tagkopoulos,et al.  Multi-omics integration accurately predicts cellular state in unexplored conditions for Escherichia coli , 2016, Nature Communications.

[12]  Adam M. Feist,et al.  Multi-omics Quantification of Species Variation of Escherichia coli Links Molecular Features with Strain Phenotypes. , 2016, Cell systems.

[13]  Edward J. O'Brien,et al.  Quantification and Classification of E. coli Proteome Utilization and Unused Protein Costs across Environments , 2016, PLoS Comput. Biol..

[14]  Ke Chen,et al.  Global Rebalancing of Cellular Resources by Pleiotropic Point Mutations Illustrates a Multi-scale Mechanism of Adaptive Evolution. , 2016, Cell systems.

[15]  Chiara Romualdi,et al.  COLOMBOS v3.0: leveraging gene expression compendia for cross-species analyses , 2015, Nucleic Acids Res..

[16]  Davide Heller,et al.  eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences , 2015, Nucleic Acids Res..

[17]  Fabio Rinaldi,et al.  RegulonDB version 9.0: high-level integration of gene regulation, coexpression, motif clustering and beyond , 2015, Nucleic Acids Res..

[18]  P. Gestraud,et al.  Independent component analysis uncovers the landscape of the bladder tumor transcriptome and reveals insights into luminal and basal subtypes. , 2014, Cell reports.

[19]  Frédéric Grenier,et al.  Complete Genome Sequence of Escherichia coli BW25113 , 2014, Genome Announcements.

[20]  Edward J. O'Brien,et al.  Use of Adaptive Laboratory Evolution To Discover Key Mutations Enabling Rapid Growth of Escherichia coli K-12 MG1655 on Glucose Minimal Medium , 2014, Applied and Environmental Microbiology.

[21]  Michael T. Zimmermann,et al.  MACE: model based analysis of ChIP-exo , 2014, Nucleic acids research.

[22]  S. Dudoit,et al.  Normalization of RNA-seq data using factor analysis of control genes or samples , 2014, Nature Biotechnology.

[23]  Edward J. O'Brien,et al.  Deciphering Fur transcriptional regulatory network highlights its complex role beyond iron metabolism in Escherichia coli , 2014, Nature Communications.

[24]  T. Hwa,et al.  Emergence of robust growth laws from optimal regulation of ribosome synthesis , 2014, Molecular systems biology.

[25]  R. Altman,et al.  Coherent Functional Modules Improve Transcription Factor Target Identification, Cooperativity Prediction, and Disease Association , 2014, PLoS genetics.

[26]  B. Palsson,et al.  Genome-scale reconstruction of the sigma factor network in Escherichia coli: topology and functional states , 2014, BMC Biology.

[27]  Robert Gentleman,et al.  Software for Computing and Annotating Genomic Ranges , 2013, PLoS Comput. Biol..

[28]  K. Valgepea,et al.  Escherichia coli achieves faster growth by increasing catalytic and translation rates of proteins. , 2013, Molecular bioSystems.

[29]  Yves Van de Peer,et al.  The Mycobacterium tuberculosis regulatory network and hypoxia , 2013, Nature.

[30]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[31]  B. Pugh,et al.  ChIP‐exo Method for Identifying Genomic Location of DNA‐Binding Proteins with Near‐Single‐Nucleotide Accuracy , 2012, Current protocols in molecular biology.

[32]  David Z. Chen,et al.  Architecture of the human regulatory network derived from ENCODE data , 2012, Nature.

[33]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[34]  Joerg M. Buescher,et al.  Global Network Reorganization During Dynamic Adaptations of Bacillus subtilis Metabolism , 2012, Science.

[35]  Donghyuk Kim,et al.  The PurR regulon in Escherichia coli K-12 MG1655 , 2011, Nucleic acids research.

[36]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[37]  Russ B. Altman,et al.  Independent component analysis: Mining microarray data for fundamental human gene expression modules , 2010, J. Biomed. Informatics.

[38]  T. Hwa,et al.  Interdependence of Cell Growth and Gene Expression: Origins and Consequences , 2010, Science.

[39]  Riet De Smet,et al.  Advantages and limitations of current network inference methods , 2010, Nature Reviews Microbiology.

[40]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[41]  Mark Gerstein,et al.  Comparing genomes to computer operating systems in terms of the topology and evolution of their regulatory control networks , 2010, Proceedings of the National Academy of Sciences.

[42]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[43]  Bernhard O. Palsson,et al.  Gene Expression Profiling and the Use of Genome-Scale In Silico Models of Escherichia coli for Analysis: Providing Context for Content , 2009, Journal of bacteriology.

[44]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[45]  H. Gunshin,et al.  A review of independent component analysis application to microarray gene expression data. , 2008, BioTechniques.

[46]  C. Turnbough,et al.  Regulation of Pyrimidine Biosynthetic Gene Expression in Bacteria: Repression without Repressors , 2008, Microbiology and Molecular Biology Reviews.

[47]  Travis E. Oliphant,et al.  Python for Scientific Computing , 2007, Computing in Science & Engineering.

[48]  Karsten Niehaus,et al.  The plasticity of global proteome and genome expression analyzed in closely related W3110 and MG1655 strains of a well-studied model organism, Escherichia coli-K12. , 2007, Journal of biotechnology.

[49]  William Stafford Noble,et al.  Quantifying similarity between motifs , 2007, Genome Biology.

[50]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[51]  De-Shuang Huang,et al.  Independent component analysis-based penalized discriminant method for tumor classification using gene expression data , 2006, Bioinform..

[52]  David Lindgren,et al.  Independent component analysis reveals new and biologically significant structures in micro array data , 2006, BMC Bioinformatics.

[53]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[54]  A. Wolfe The Acetate Switch , 2005, Microbiology and Molecular Biology Reviews.

[55]  Bruno Torrésani,et al.  Blind Source Separation and the Analysis of Microarray Data , 2004, J. Comput. Biol..

[56]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[57]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[58]  David J. C. MacKay,et al.  A decomposition model to track gene expression signatures: preview on observer-independent classification of ovarian cancer , 2002, Bioinform..

[59]  C. Yanofsky,et al.  Regulation by transcription attenuation in bacteria: how RNA provides instructions for transcription termination/antitermination decisions. , 2002, BioEssays : news and reviews in molecular, cellular and developmental biology.

[60]  Emden R. Gansner,et al.  An open graph visualization system and its applications to software engineering , 2000, Softw. Pract. Exp..

[61]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[62]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[63]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[64]  K. Jensen The Escherichia coli K-12 "wild types" W3110 and MG1655 have an rph frameshift mutation that leads to pyrimidine starvation due to low pyrE expression levels , 1993, Journal of bacteriology.

[65]  B. Dalrymple,et al.  Promotion of RNA transcription on the insertion element IS30 of E. coli K12. , 1985, The EMBO journal.

[66]  C Yanofsky,et al.  Attenuation in amino acid biosynthetic operons. , 1982, Annual review of genetics.

[67]  Donghyuk Kim Systems Evaluation of Regulatory Components in Bacterial Transcription Initiation , 2014 .

[68]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[69]  Pierre-Antoine Absil,et al.  Elucidating the Altered Transcriptional Programs in Breast Cancer using Independent Component Analysis , 2007, PLoS Comput. Biol..

[70]  E. Nudler,et al.  The riboswitch control of bacterial metabolism. , 2004, Trends in biochemical sciences.

[71]  M. Sarkar,et al.  A comparative study of variation in codon 33 of the rpoS gene in Escherichia coli K12 stocks: implications for the synthesis of σs , 2003, Molecular Genetics and Genomics.

[72]  Wolfram Liebermeister,et al.  Linear modes of gene expression determined by independent component analysis , 2002, Bioinform..