Relative stability of DNA as a generic criterion for promoter prediction: whole genome annotation of microbial genomes with varying nucleotide base composition.

The rapid increase in genome sequence information has necessitated the annotation of their functional elements, particularly those occurring in the non-coding regions, in the genomic context. Promoter region is the key regulatory region, which enables the gene to be transcribed or repressed, but it is difficult to determine experimentally. Hence an in silico identification of promoters is crucial in order to guide experimental work and to pin point the key region that controls the transcription initiation of a gene. In this analysis, we demonstrate that while the promoter regions are in general less stable than the flanking regions, their average free energy varies depending on the GC composition of the flanking genomic sequence. We have therefore obtained a set of free energy threshold values, for genomic DNA with varying GC content and used them as generic criteria for predicting promoter regions in several microbial genomes, using an in-house developed tool PromPredict. On applying it to predict promoter regions corresponding to the 1144 and 612 experimentally validated TSSs in E. coli (50.8% GC) and B. subtilis (43.5% GC) sensitivity of 99% and 95% and precision values of 58% and 60%, respectively, were achieved. For the limited data set of 81 TSSs available for M. tuberculosis (65.6% GC) a sensitivity of 100% and precision of 49% was obtained.

[1]  D. Ding,et al.  Identification and categorization of horizontally transferred genes in prokaryotic genomes. , 2005, Acta biochimica et biophysica Sinica.

[2]  K. Tamura,et al.  Metabolic engineering of plant alkaloid biosynthesis. Proc Natl Acad Sci U S A , 2001 .

[3]  S. Aiyar,et al.  Contributions of UP Elements and the Transcription Factor FIS to Expression from the Seven rrn P1 Promoters inEscherichia coli , 2001, Journal of bacteriology.

[4]  J. Fickett,et al.  Eukaryotic promoter recognition. , 1997, Genome research.

[5]  Poonam Singhal,et al.  Prokaryotic gene finding based on physicochemical characteristics of codons calculated from molecular dynamics simulations. , 2008, Biophysical journal.

[6]  Manju Bansal,et al.  Identification and annotation of promoter regions in microbial genome sequences on the basis of DNA stability , 2007, Journal of Biosciences.

[7]  Shankar Balasubramanian,et al.  G-quadruplexes in promoters throughout the human genome , 2006, Nucleic acids research.

[8]  M. Sagot,et al.  Promoter sequences and algorithmical methods for identifying them. , 1999, Research in microbiology.

[9]  Stephen C. J. Parker,et al.  Detection of DNA structural motifs in functional genomic elements. , 2007, Genome research.

[10]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[11]  Victor V. Solovyev,et al.  PromH: promoters identification using orthologous genomic sequences , 2003, Nucleic Acids Res..

[12]  Charles DeLisi,et al.  Machine learning for regulatory analysis and transcription factor target prediction in yeast , 2006, Systems and Synthetic Biology.

[13]  Santiago Garcia-Vallvé,et al.  HGT-DB: a database of putative horizontally transferred genes in prokaryotic complete genomes , 2003, Nucleic Acids Res..

[14]  Mark Borodovsky,et al.  GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses , 2005, Nucleic Acids Res..

[15]  T. Tullius,et al.  Using hydroxyl radical to probe DNA structure. , 1992, Methods in enzymology.

[16]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[17]  Jacques van Helden,et al.  Regulatory Sequence Analysis Tools , 2003, Nucleic Acids Res..

[18]  R. Gourse,et al.  A third recognition element in bacterial promoters: DNA binding by the alpha subunit of RNA polymerase. , 1993, Science.

[19]  Vinod Scaria,et al.  Quadfinder: server for identification and analysis of quadruplex-forming motifs in nucleotide sequences , 2006, Nucleic Acids Res..

[20]  Pierre-Étienne Jacques,et al.  MtbRegList, a database dedicated to the analysis of transcriptional regulation in Mycobacterium tuberculosis , 2005, Bioinform..

[21]  E. Brody,et al.  Prediction of rho-independent Escherichia coli transcription terminators. A statistical analysis of their RNA stem-loop structures. , 1990 .

[22]  E. Nudler,et al.  The mechanism of intrinsic transcription termination. , 1999, Molecular cell.

[23]  S. Busby,et al.  Identification and analysis of 'extended -10' promoters in Escherichia coli. , 2003, Nucleic acids research.

[24]  India G. Hook-Barnard,et al.  Regulatory Architecture of the Iron-RegulatedfepD-ybdA Bidirectional Promoter Region inEscherichia coli , 2001, Journal of bacteriology.

[25]  R. Gourse,et al.  Identification of an UP element consensus sequence for bacterial promoters. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Yvan Saeys,et al.  Generic eukaryotic core promoter prediction using structural features of DNA. , 2008, Genome research.

[27]  Julio Collado-Vides,et al.  RegulonDB (version 4.0): transcriptional regulation, operon organization and growth conditions in Escherichia coli K-12 , 2004, Nucleic Acids Res..

[28]  C. Bruni,et al.  Structure and function of the internal promoter (hisBp) of the Escherichia coli K-12 histidine operon , 1983, Journal of bacteriology.

[29]  Yvan Saeys,et al.  ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles , 2008, ISMB.

[30]  R. Ebright,et al.  Bacterial promoter architecture: subsite structure of UP elements and interactions with the carboxy-terminal domain of the RNA polymerase alpha subunit. , 1999, Genes & development.

[31]  Huiquan Wang,et al.  Promoter prediction and annotation of microbial genomes based on DNA sequence and structural responses to superhelical stress , 2006, BMC Bioinformatics.

[32]  D. Haussler,et al.  A hidden Markov model that finds genes in E. coli DNA. , 1994, Nucleic acids research.

[33]  S. Aiyar,et al.  Escherichia coli Promoters with UP Elements of Different Strengths: Modular Structure of Bacterial Promoters , 1998, Journal of bacteriology.

[34]  C. Arrowsmith,et al.  DNA Binding Specificity Studies of Four ETS Proteins Support an Indirect Read-out Mechanism of Protein-DNA Recognition* 210 , 2000, The Journal of Biological Chemistry.

[35]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[36]  James W. Fickett,et al.  The Gene Identification Problem: An Overview for Developers , 1995, Comput. Chem..

[37]  I. Saint Girons,et al.  Evidence for an internal promoter in the Escherichia coli threonine operon , 1985, Journal of bacteriology.

[38]  Huiquan Wang,et al.  SIDDBASE: a database containing the stress-induced DNA duplex destabilization (SIDD) profiles of complete microbial genomes , 2005, Nucleic Acids Res..

[39]  R. Hengge-aronis,et al.  Identification of transcriptional start sites and the role of ppGpp in the expression of rpoS, the structural gene for the sigma S subunit of RNA polymerase in Escherichia coli , 1995, Journal of bacteriology.

[40]  M. Noordewier,et al.  Stress-induced DNA duplex destabilization (SIDD) in the E. coli genome: SIDD sites are closely associated with promoters. , 2004, Genome research.

[41]  H. Kung,et al.  Internal promoter in the ilvGEDA transcription unit of Escherichia coli K-12 , 1985, Journal of bacteriology.

[42]  Martin G. Reese,et al.  Application of a Time-delay Neural Network to Promoter Annotation in the Drosophila Melanogaster Genome , 2001, Comput. Chem..

[43]  N. W. Davis,et al.  The complete genome sequence of Escherichia coli K-12. , 1997, Science.

[44]  E N Trifonov,et al.  A computer algorithm for testing potential prokaryotic terminators. , 1984, Nucleic acids research.

[45]  J. SantaLucia,et al.  A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[46]  Denis Thieffry,et al.  Prediction of transcriptional regulatory sites in the complete genome sequence of Escherichia coli K-12 , 1998, Bioinform..

[47]  Manju Bansal,et al.  Structural properties of promoters: similarities and differences between prokaryotes and eukaryotes , 2005, Nucleic acids research.

[48]  H. Ochman,et al.  Molecular archaeology of the Escherichia coli genome. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Eleazar Eskin,et al.  Systems Biology and Regulatory Genomics, Joint Annual RECOMB 2005 Satellite Workshops on Systems Biology and on Regulatory Genomics, San Diego, CA, USA; December 2-4, 2005, Revised Selected Papers , 2006, Systems Biology and Regulatory Genomics.

[50]  W. Reznikoff,et al.  Deletion analysis of RNA polymerase interaction sites in the Escherichia coli lactose operon regulatory region. , 1986, Journal of molecular biology.

[51]  宁北芳,et al.  疟原虫var基因转换速率变化导致抗原变异[英]/Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A , 2005 .

[52]  A. Danchin,et al.  Evidence for horizontal gene transfer in Escherichia coli speciation. , 1991, Journal of molecular biology.

[53]  P. Botchan An electron microscopic comparison of transcription on linear and superhelical DNA. , 1976, Journal of molecular biology.

[54]  Kenta Nakai,et al.  BTBS: database of transcriptional regulation in Bacillus subtilis and its contribution to comparative genomics , 2004, Nucleic Acids Res..

[55]  Peter D. Karp,et al.  EcoCyc: a comprehensive database resource for Escherichia coli , 2004, Nucleic Acids Res..

[56]  J. SantaLucia,et al.  Thermodynamics and NMR of internal G.T mismatches in DNA. , 1997, Biochemistry.

[57]  Julio Collado-Vides,et al.  Sigma70 promoters in Escherichia coli: specific transcription in dense regions of overlapping promoter-like signals. , 2003, Journal of molecular biology.

[58]  R. Wollgiehn RNA Polymerase and Regulation of Transcription , 1982 .

[59]  V. de Lorenzo,et al.  Coordinated Repression In Vitro of the DivergentfepA-fes Promoters of Escherichia coli by the Iron Uptake Regulation (Fur) Protein , 1998, Journal of bacteriology.

[60]  Benny Shorner,et al.  Long W tracts are over-represented in the Escherichia coli and Haemophilus influenzae genomes , 1999 .

[61]  Yiqiang Zhao,et al.  Genome-wide analysis reveals regulatory role of G4 DNA in gene transcription. , 2008, Genome research.

[62]  Mitali Mukerji,et al.  Genome-wide prediction of G4 DNA as regulatory motifs: role in Escherichia coli global regulation. , 2006, Genome research.

[63]  Manju Bansal,et al.  A novel method for prokaryotic promoter prediction based on DNA stability , 2005, BMC Bioinformatics.

[64]  P. Rouzé,et al.  Current methods of gene prediction, their strengths and weaknesses. , 2002, Nucleic acids research.

[65]  H. Matsuda,et al.  Biased biological functions of horizontally transferred genes in prokaryotic genomes , 2004, Nature Genetics.

[66]  Manju Bansal,et al.  An assessment of three dinucleotide parameters to predict DNA curvature by quantitative comparison with experimental data. , 2003, Nucleic acids research.