Promoter prediction and annotation of microbial genomes based on DNA sequence and structural responses to superhelical stress

BackgroundIn our previous studies, we found that the sites in prokaryotic genomes which are most susceptible to duplex destabilization under the negative superhelical stresses that occur in vivo are statistically highly significantly associated with intergenic regions that are known or inferred to contain promoters. In this report we investigate how this structural property, either alone or together with other structural and sequence attributes, may be used to search prokaryotic genomes for promoters.ResultsWe show that the propensity for stress-induced DNA duplex destabilization (SIDD) is closely associated with specific promoter regions. The extent of destabilization in promoter-containing regions is found to be bimodally distributed. When compared with DNA curvature, deformability, thermostability or sequence motif scores within the -10 region, SIDD is found to be the most informative DNA property regarding promoter locations in the E. coli K12 genome. SIDD properties alone perform better at detecting promoter regions than other programs trained on this genome. Because this approach has a very low false positive rate, it can be used to predict with high confidence the subset of promoters that are strongly destabilized. When SIDD properties are combined with -10 motif scores in a linear classification function, they predict promoter regions with better than 80% accuracy. When these methods were tested with promoter and non-promoter sequences from Bacillus subtilis, they achieved similar or higher accuracies. We also present a strictly SIDD-based predictor for annotating promoter sequences in complete microbial genomes.ConclusionIn this report we show that the propensity to undergo stress-induced duplex destabilization (SIDD) is a distinctive structural attribute of many prokaryotic promoter sequences. We have developed methods to identify promoter sequences in prokaryotic genomes that use SIDD either as a sole predictor or in combination with other DNA structural and sequence properties. Although these methods cannot predict all the promoter-containing regions in a genome, they do find large sets of potential regions that have high probabilities of being true positives. This approach could be especially valuable for annotating those genomes about which there is limited experimental data.

[1]  D. K. Hawley,et al.  Compilation and analysis of Escherichia coli promoter DNA sequences. , 1983, Nucleic acids research.

[2]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[3]  D Court,et al.  Regulatory sequences involved in the promotion and termination of RNA transcription. , 1979, Annual review of genetics.

[4]  Chengpeng Bi,et al.  The Analysis of Stress-Induced Duplex Destabilization in Long Genomic DNA Sequences , 2004, J. Comput. Biol..

[5]  P. Dehaseth,et al.  Protein-nucleic acid interactions during open complex formation investigated by systematic alteration of the protein and DNA binding partners. , 1999, Biochemistry.

[6]  S. Salzberg,et al.  Improved microbial gene identification with GLIMMER. , 1999, Nucleic acids research.

[7]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[8]  Alexander Bolshoy,et al.  Curvature distribution in prokaryotic genomes , 2004, Silico Biol..

[9]  Manju Bansal,et al.  A novel method for prokaryotic promoter prediction based on DNA stability , 2005, BMC Bioinformatics.

[10]  Martin G. Reese,et al.  Application of a Time-delay Neural Network to Promoter Annotation in the Drosophila Melanogaster Genome , 2001, Comput. Chem..

[11]  C. Harley,et al.  Analysis of E. coli promoter sequences. , 1987, Nucleic acids research.

[12]  Huiquan Wang,et al.  SIDDBASE: a database containing the stress-induced DNA duplex destabilization (SIDD) profiles of complete microbial genomes , 2005, Nucleic Acids Res..

[13]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[14]  H. Margalit,et al.  Determination of common structural features in Escherichia coli promoters by computer analysis. , 1994, European journal of biochemistry.

[15]  C. Benham,et al.  Sites of predicted stress-induced DNA duplex destabilization occur preferentially at regulatory loci. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Heinrich Niemann,et al.  Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition , 2001, ISMB.

[17]  G. W. Hatfield,et al.  Activation of transcription initiation from a stable RNA promoter by a Fis protein‐mediated DNA structural transmission mechanism , 2004, Molecular microbiology.

[18]  M. Borodovsky,et al.  How to interpret an anonymous bacterial genome: machine learning approach to gene identification. , 1998, Genome research.

[19]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[20]  Julio Collado-Vides,et al.  RegulonDB (version 4.0): transcriptional regulation, operon organization and growth conditions in Escherichia coli K-12 , 2004, Nucleic Acids Res..

[21]  S Brunak,et al.  A DNA structural atlas for Escherichia coli. , 2000, Journal of molecular biology.

[22]  V. Zhurkin,et al.  DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Edward N. Trifonov,et al.  CURVATURE: software for the analysis of curved DNA , 1993, Comput. Appl. Biosci..

[24]  M. Noordewier,et al.  Stress-induced DNA duplex destabilization (SIDD) in the E. coli genome: SIDD sites are closely associated with promoters. , 2004, Genome research.

[25]  G. Stormo,et al.  Escherichia coli promoter sequences: analysis and prediction. , 1996, Methods in enzymology.

[26]  Craig J. Benham,et al.  Activation of Gene Expression by a Novel DNA Structural Transmission Mechanism That Requires Supercoiling-induced DNA Duplex Destabilization in an Upstream Activating Sequence* , 1998, The Journal of Biological Chemistry.

[27]  Julio Collado-Vides,et al.  Sigma70 promoters in Escherichia coli: specific transcription in dense regions of overlapping promoter-like signals. , 2003, Journal of molecular biology.

[28]  Kenta Nakai,et al.  BTBS: database of transcriptional regulation in Bacillus subtilis and its contribution to comparative genomics , 2004, Nucleic Acids Res..

[29]  J. SantaLucia,et al.  A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. , 1998, Proceedings of the National Academy of Sciences of the United States of America.