Comparative genomic reconstruction of transcriptional regulatory networks in bacteria.

Microorganisms in most ecological niches are constantly exposed to variations of many environmental factors including temperature, oxygen, nutrient and water availability, presence of toxic compounds, and interaction with other organisms. Changing gene-expression patterns is a major adaptive response to these variations. Expression of genes in bacteria is controlled by a variety of mechanisms based on the level of transcription or translation. In most cases, the switch in gene expression is mediated by the specific regulatory proteins that receive an appropriate intra- or extracellular signal and trigger the specific transcriptional response.1 The key components of transcriptional regulatory machinery in prokaryotes are transcription factors (TFs) and transcription factor-binding sites (TFBSs), sigma factors and promoters, antiterminator proteins, and cis- and trans-acting regulatory RNAs. TFs are proteins that recognize specific cis-regulatory DNA sequences (TFBSs) to either stimulate or repress transcription of genes.2 Sigma factors are prokaryotic transcription initiation factors that must be a part of RNA polymerase holoenzyme for specific binding to promoter sites encoded in the 5`-untranslated regions (UTR) of genes.3 Antiterminator protein factors bind to specific secondary structures in the leader region of mRNA to restart transcription of a gene.4 Cellular signals that can modulate TFs include binding of a small molecule (effector), interaction with other proteins (e.g. phosphorylation), changing redox state, temperature, and other conditions.5 Finally, various regulatory RNA structures including cis-acting metabolite-sensing riboswitches, T-boxes, and attenuators6,7 and trans-acting small RNAs8 control gene expression without involvement of specific proteins. The operon is a set of adjacent genes that are transcribed as a single polycistronic mRNA. This organization of genes in operons achieves for bacteria a simple solution to the problem of co-regulating genes that participate in the same metabolic process. However, the operon strategy has some limitations, e.g., the inability to combine both an independent regulation and a coordinated control. Regulon organization presents a level of control above the operons and permits coordinated control of operons that each have their own unique control. The regulon is a group of operons controlled by a common TF or regulatory RNA. The regulon usually includes genes that are implicated in a common cellular subsystem or a pathway. For example, in most species the regulons for arginine and thiamin biosynthesis genes are controlled by the ArgR repressor and THI riboswitch, respectively.9,10 However, most responses of bacterial cells to even a simple environmental stimulus are complex. Thus, the operational term stimulon was defined to refer to an ensemble of genes (often involving multiple regulons and independent operons) that respond to a common environmental stimulus, although they may not share a common mechanism of regulation. For example, the heat-shock and phosphate-starvation stimulons include hundreds of genes in Escherichia coli, while only some of them are known members of the σ32 and PhoB regulons, respectively.11 Finally, the term modulon was introduced to define a set of genes that are either directly or indirectly controlled by a certain regulatory system. Fine-tuned environmental responses require efficient, flexible, and robust transcriptional regulatory networks (TRNs) that contain both internal checkpoints and feedback mechanisms to orchestrate the level of gene expression. To define a particular TRN, we need to specify which TFs bind to the promoter regions of which genes and what is the integrated effect of all these TFs on the expression of all these genes.12 Reconstruction of TRNs helps to better understand the metabolism and functions of prokaryotic organisms.13 Accumulated amount of information about gene regulation networks was used to define the basic building blocks of complex TRNs, termed network motifs, and to undestand their design principles.14 Traditional experimental methods for analysis of transcriptional gene regulation (such as gene cloning, knockout, reporter fusion and in vitro transcription) and characterization of TFBSs (electrophoretic mobility shift, nuclease protection assays) have been very powerful. However, they have certain limitations both in terms of productivity (the scale) and feasibility (e.g. for non-model organisms). Development of high-throughput transcriptome and proteome approaches allows thousands of genes and hundreds of proteins to be studied in a single experiment. DNA microarray technology has revealed the role of many regulatory factors in global regulatory networks in Escherichia coli, Bacillus subtilis, and other model bacteria.15 However, in many cases the complexity of the interactions between regulons makes it difficult to distinguish between direct and indirect effects on transcription. Another high-throughput experimental approach, the ChIP-on-chip technique (see section 3.1), is increasingly used for investigation of the genome-wide DNA binding of global TFs in bacteria.16–20 Wide-ranging proteomic approach was used to assess phosphate-starvation response in Vibrio cholerae, the iron regulatory network in Rhizobium leguminosarum, and the bacteroid proteins network in Bradyrhizobium japonicum.21–23 Finally, recent advances in tandem mass spectrometry and development of the powerful computational algorithms enable de novo shotgun sequencing of protein mixtures, thus provinding another promising approach for high-throughput protein expression analysis.24–25 A constantly growing number of complete prokaryotic genomes allows computational biologists to extensively use comparative genomic approaches to predict cis-acting regulatory elements (TFBSs, RNA elements) and to reconstruct TRNs in bacteria.12,13,26–29 The major directions of this analysis involve analysis and description of previously known regulons in uncharacterized organisms and ab initio prediction of novel regulons. Finally, the comparative analysis of regulons combined with other techniques of genome context analysis (see section 3.3) helps significantly to improve quality and accuracy of functional gene annotations and predict novel genes in a variety of pathways. The focus of this review is on novel approaches to the analysis of bacterial regulons, including the methods of identification of TFBSs and RNA regulatory elements based on comparative genomics (Section 2). It provides a summary of several major studies on the computational reconstruction of certain TRNs in bacteria and an overview of Web-accessible databases of microbial TFs and their TFBSs (Section 3). Finally, I discuss the likely evolutionary scenarios for bacterial regulons and the balance of conservation and flexibility in the composition of TRNs among species (Section 4).