Search of regular sequences in promoters from eukaryotic genomes

In this paper, the notion of "regularity" is introduced to describe the structural features of DNA sequences. This notion expands the "latent periodicity" term. The novel method for revealing regularity based on the runs test is described. The search of regular sequences in eukaryotic promoters has shown that more than 60% of them possess a regularity property on statistically significant level. Possible biological functions of regularity are discussed together with the possibility of using this characteristic for performing promoter annotation.

[1]  P. Hoel,et al.  Introduction to Mathematical Statistics. Second Edition. , 1955 .

[2]  Philipp Bucher,et al.  EPD in its twentieth year: towards complete promoter coverage of selected model organisms , 2005, Nucleic Acids Res..

[3]  J. Fickett,et al.  Eukaryotic promoter recognition. , 1997, Genome research.

[4]  Y. Matsuyama,et al.  Promoter recognition for E. coli DNA segments by independent component analysis , 2004 .

[5]  E. Nevo,et al.  Ecologic genomics of DNA: upstream bending in prokaryotic promoters. , 2000, Genome research.

[6]  Steen Knudsen,et al.  Promoter2.0: for the recognition of PolII promoter sequences , 1999, Bioinform..

[7]  Martin G. Reese,et al.  Application of a Time-delay Neural Network to Promoter Annotation in the Drosophila Melanogaster Genome , 2001, Comput. Chem..

[8]  Konstantin Skryabin,et al.  Search and classification of potential minisatellite sequences from bacterial genomes. , 2006, DNA research : an international journal for rapid publication of reports on genes and genomes.

[9]  Sin Lam Tan,et al.  Promoter prediction analysis on the whole human genome , 2004, Nature Biotechnology.

[10]  K. J. Hertel,et al.  Combinatorial Control of Exon Recognition* , 2008, Journal of Biological Chemistry.

[11]  V. Chechetkin,et al.  REVIEWS OF TOPICAL PROBLEMS: Order and correlations in genomic DNA sequences. The spectral approach , 2000 .

[12]  W. Dixon,et al.  Introduction to Mathematical Statistics. , 1964 .

[13]  Hong Yan,et al.  PromoterExplorer: an effective promoter identification method based on the AdaBoost algorithm , 2006, Bioinform..

[14]  A A Deev,et al.  DNA bendability--a novel feature in E. coli promoter recognition. , 1999, Journal of biomolecular structure & dynamics.

[15]  Nikolai A. Kudryashov,et al.  [Latent periodicity of serine-threonine and tyrosine protein kinases and another protein families]. , 2004, Molekuliarnaia biologiia.

[16]  Gajendra P. S. Raghava,et al.  Spectral Repeat Finder (SRF): identification of repetitive sequences using Fourier transformation , 2004, Bioinform..

[17]  A K Konopka,et al.  Noncoding DNA, Zipf's law, and language. , 1995, Science.

[18]  Pierre Baldi,et al.  The Biology of Eukaryotic Promoter Prediction - A Review , 1999, Comput. Chem..

[19]  Michael Q. Zhang Computational analyses of eukaryotic promoters , 2007, BMC Bioinformatics.

[20]  L. N. Balaam,et al.  Statistical Theory and Methodology in Science and Engineering , 1966 .

[21]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[22]  G. B. Hutchinson,et al.  The prediction of vertebrate promoter regions using differential hexamer frequency analysis , 1996, Comput. Appl. Biosci..

[23]  Ray Walshe,et al.  Pol II promoter prediction using characteristic 4-mer motifs: a machine learning approach , 2008, BMC Bioinformatics.

[24]  Michael Q. Zhang,et al.  Computational identification of promoters and first exons in the human genome , 2001, Nature Genetics.

[25]  N G Esipova,et al.  [Periodicity in contacts of RNA-polymerase with promotors]. , 1999, Biofizika.

[26]  Vladimir B. Bajic,et al.  An Intelligent System for Vertebrate Promoter Recognition , 2002, IEEE Intell. Syst..

[27]  H. Halvorson,et al.  DNA bending in transcription initiation. , 2008, Biochemistry.

[28]  T. Mizuno Static bend of DNA helix at the activator recognition site of the ompF promoter in Escherichia coli. , 1987, Gene.

[29]  D. Sheskin Handbook of Parametric and Nonparametric Statistical Procedures: Third Edition , 2000 .

[30]  Douglas W. Smith Biocomputing: informatics and genome projects. , 1994 .

[31]  Nikolai A. Kudryashov,et al.  Information decomposition method to analyze symbolical sequences , 2003 .

[32]  R. A. Fox,et al.  Introduction to Mathematical Statistics , 1947 .

[33]  D. S. Prestridge Predicting Pol II promoter sequences using transcription factor binding sites. , 1995, Journal of molecular biology.

[34]  G. Rubin,et al.  Computational analysis of core promoters in the Drosophila genome , 2002, Genome Biology.

[35]  Andrea Tanzer,et al.  Comparative promoter region analysis powered by CORG , 2005, BMC Genomics.

[36]  Thomas Werner,et al.  The State of the Art of Mammalian Promoter Recognition , 2003, Briefings Bioinform..

[37]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[38]  E. Korotkov,et al.  MMsat--a database of potential micro- and minisatellites. , 2008, Gene.

[39]  E. Trifonov,et al.  Sequence-directed Mapping of Nucleosome Positions , 2007, Journal of biomolecular structure & dynamics.

[40]  J. Claverie Computational methods for the identification of genes in vertebrate genomic sequences. , 1997, Human molecular genetics.

[41]  H Niemann,et al.  Identification and analysis of eukaryotic promoters: recent computational approaches. , 2001, Trends in genetics : TIG.

[42]  M. Q. Zhang,et al.  Periodical distribution of transcription factor sites in promoter regions and connection with chromatin structure. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Victor V. Solovyev,et al.  PromH: promoters identification using orthologous genomic sequences , 2003, Nucleic Acids Res..

[44]  W. R. Buckland,et al.  Statistical Theory and Methodology in Science and Engineering. , 1960 .

[45]  Naum I Gershenzon,et al.  The features of Drosophila core promoters revealed by statistical analysis , 2006, BMC Genomics.

[46]  G. Wetherill,et al.  Statistical Theory and Methodology in Science and Engineering. , 1962 .

[47]  Andrzej K. Konopka,et al.  Sequences and Codes: Fundamentals of Biomolecular Cryptology , 1994 .

[48]  E. Trifonov,et al.  Specific Selection Pressure at the Third Codon Positions: Contribution to 10- to 11-Base Periodicity in Prokaryotic Genomes , 2006, Journal of Molecular Evolution.

[49]  Mikhail S. Gelfand,et al.  Gene recognition in eukaryotic DNA by comparison of genomic sequences , 2001, Bioinform..

[50]  Hanspeter Herzel,et al.  10-11 bp periodicities in complete genomes reflect protein structure and DNA folding , 1999, Bioinform..

[51]  T. Werner,et al.  Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approach. , 2000, Journal of molecular biology.

[52]  Seng Hong Seah,et al.  Dragon gene start finder: an advanced system for finding approximate locations of the start of gene transcriptional units. , 2003, Genome research.