A novel method for prokaryotic promoter prediction based on DNA stability

BackgroundIn the post-genomic era, correct gene prediction has become one of the biggest challenges in genome annotation. Improved promoter prediction methods can be one step towards developing more reliable ab initio gene prediction methods. This work presents a novel prokaryotic promoter prediction method based on DNA stability.ResultsThe promoter region is less stable and hence more prone to melting as compared to other genomic regions. Our analysis shows that a method of promoter prediction based on the differences in the stability of DNA sequences in the promoter and non-promoter region works much better compared to existing prokaryotic promoter prediction programs, which are based on sequence motif searches. At present the method works optimally for genomes such as that of Escherichia coli, which have near 50 % G+C composition and also performs satisfactorily in case of other prokaryotic promoters.ConclusionsOur analysis clearly shows that the change in stability of DNA seems to provide a much better clue than usual sequence motifs, such as Pribnow box and -35 sequence, for differentiating promoter region from non-promoter regions. To a certain extent, it is more general and is likely to be applicable across organisms. Hence incorporation of such features in addition to the signature motifs can greatly improve the presently available promoter prediction programs.

[1]  I. T. Young Proof without prejudice: use of the Kolmogorov-Smirnov test for the analysis of histograms from flow systems and other sources. , 1977, The journal of histochemistry and cytochemistry : official journal of the Histochemistry Society.

[2]  W Szybalski,et al.  A relationship between DNA helix stability and recognition sites for RNA polymerase. , 1979, Science.

[3]  Robert Entriken,et al.  Escherichia coli promoter sequences predict in vitro RNA polymerase selectivity , 1984, Nucleic Acids Res..

[4]  R Staden Computer methods to locate signals in nucleic acid sequences , 1984, Nucleic Acids Res..

[5]  H. Blöcker,et al.  Predicting DNA duplex stability from the base sequence. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[6]  R Nussinov,et al.  Helix stability in prokaryotic promoter regions. , 1988, Biochemistry.

[7]  Jacob V. Maizel,et al.  Discriminant analysis of promoter regions in Escherichia coli sequences , 1988, Comput. Appl. Biosci..

[8]  N N Alexandrov,et al.  Application of a new method of pattern recognition in DNA sequence analysis: a study of E. coli promoters. , 1990, Nucleic acids research.

[9]  H. Margalit,et al.  Determination of common structural features in Escherichia coli promoters by computer analysis. , 1994, European journal of biochemistry.

[10]  G. Stormo,et al.  Escherichia coli promoter sequences: analysis and prediction. , 1996, Methods in enzymology.

[11]  G. B. Hutchinson,et al.  The prediction of vertebrate promoter regions using differential hexamer frequency analysis , 1996, Comput. Appl. Biosci..

[12]  James W. Fickett,et al.  The Gene Identification Problem: An Overview for Developers , 1995, Comput. Chem..

[13]  J. Fickett,et al.  Eukaryotic promoter recognition. , 1997, Genome research.

[14]  J. Claverie Computational methods for the identification of genes in vertebrate genomic sequences. , 1997, Human molecular genetics.

[15]  J. SantaLucia,et al.  Thermodynamics and NMR of internal G.T mismatches in DNA. , 1997, Biochemistry.

[16]  J. SantaLucia,et al.  A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[18]  Pierre Baldi,et al.  The Biology of Eukaryotic Promoter Prediction - A Review , 1999, Comput. Chem..

[19]  G. Stormo Gene-finding approaches for eukaryotes. , 2000, Genome research.

[20]  S Brunak,et al.  A DNA structural atlas for Escherichia coli. , 2000, Journal of molecular biology.

[21]  Hanah Margalit,et al.  PromEC: An updated database of Escherichia coli mRNA promoters with experimentally identified transcriptional start sites , 2001, Nucleic Acids Res..

[22]  H. Margalit,et al.  Novel small RNA-encoding genes in the intergenic regions of Escherichia coli , 2001, Current Biology.

[23]  Heinrich Niemann,et al.  Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition , 2001, ISMB.

[24]  Martin G. Reese,et al.  Application of a Time-delay Neural Network to Promoter Annotation in the Drosophila Melanogaster Genome , 2001, Comput. Chem..

[25]  P. Rouzé,et al.  Current methods of gene prediction, their strengths and weaknesses. , 2002, Nucleic acids research.

[26]  Michael Q. Zhang Computational prediction of eukaryotic protein-coding genes , 2002, Nature Reviews Genetics.

[27]  Kathleen Marchal,et al.  Computational Approaches to Identify Promoters and cis-Regulatory Elements in Plant Genomes1 , 2003, Plant Physiology.

[28]  M. Pátek,et al.  Promoters of Corynebacterium glutamicum. , 2003, Journal of biotechnology.

[29]  Julio Collado-Vides,et al.  Sigma70 promoters in Escherichia coli: specific transcription in dense regions of overlapping promoter-like signals. , 2003, Journal of molecular biology.

[30]  Thomas Werner,et al.  The State of the Art of Mammalian Promoter Recognition , 2003, Briefings Bioinform..

[31]  Yu Qiu,et al.  Predicting bacterial transcription units using sequence and expression data , 2003, ISMB.

[32]  Jacques van Helden,et al.  Regulatory Sequence Analysis Tools , 2003, Nucleic Acids Res..

[33]  V. G. Levitsky,et al.  Computer Analysis and Recognition of Drosophila melanogasterGene Promoters , 2001, Molecular Biology.

[34]  Kenta Nakai,et al.  BTBS: database of transcriptional regulation in Bacillus subtilis and its contribution to comparative genomics , 2004, Nucleic Acids Res..

[35]  Alan R Bishop,et al.  DNA dynamically directs its own transcription initiation. , 2004, Nucleic acids research.

[36]  S. Burden,et al.  Improving promoter prediction for the NNPP2.2 algorithm: a case study using Escherichia coli DNA sequences. , 2005, Bioinformatics.

[37]  R. Zhang,et al.  Improving promoter prediction for the NNPP 2 . 2 algorithm : a case study using Escherichia coli DNA sequences , 2004 .