Local thermodynamic stability scores are well represented by a non-central student's t distribution.

Local folding in mRNAs is closely associated w ith biological functions. In this study, we reveal the whole distribution of local thermodynamic stability in the complete genome of the poliovirus P3/Leon/37 and the single-stranded RNA sequences that corresponds to the nucleotide sequence of the complete genome sequence (1 667 867 bp) of Helicobacter pylori (H. pylori) strain 26695. Local thermodynamic stability in the RNA sequences is measured by two standard z -scores, significance score and stability score. To estimate the distribution of thermodynamic stability, a model based on the non-central Student's t distribution has been developed. Significant patterns of extremes that are either much more stable or unstable than expected by chance are detected. Our results indicate that the highly stable and statistically more significant folding regions are predominantly in non-coding sequences in the two genome sequences. Moreover, the highly unstable folding regions, on the contrary, are predominantly in the protein coding sequences of H. pylori. The observed differences across the complete genomic sequences are statistically very significant by a chi2-test. These extreme patterns may be useful in searching for target sequences for long-chain antisense RNA and for locating potential RNA functional elements involved in the regulation of gene expression including translation, mRNA localization and metabolism.

[1]  Norman T. J. Bailey,et al.  Statistical Methods in Biology , 1959 .

[2]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[3]  E. Lehmann,et al.  Testing Statistical Hypothesis. , 1960 .

[4]  P. J. Green,et al.  Probability and Statistical Inference , 1978 .

[5]  D. Turner,et al.  Improved free-energy parameters for predictions of RNA duplex stability. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[6]  N. Sonenberg,et al.  Internal initiation of translation of eukaryotic mRNA directed by a sequence derived from poliovirus RNA , 1988, Nature.

[7]  S. Le,et al.  Stability of RNA stem-loop structure and distribution of non-random structure in the human immunodeficiency virus (HIV-I). , 1988, Nucleic acids research.

[8]  D. Turner,et al.  Improved predictions of secondary structures for RNA. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[9]  S. Le,et al.  A method for assessing the statistical significance of RNA folding. , 1989, Journal of theoretical biology.

[10]  S. Le,et al.  The HIV-1 rev trans-activator acts through a structured target sequence to activate nuclear export of unspliced viral mRNA , 1989, Nature.

[11]  S. Le,et al.  Thermodynamic stability and statistical significance of potential stem-loop structures situated at the frameshift sites of retroviruses. , 1989, Nucleic acids research.

[12]  Bruce A. Shapiro,et al.  A computational procedure for assessing the significance of RNA secondary structure , 1990, Comput. Appl. Biosci..

[13]  Bryan R. Cullen,et al.  HIV-1 structural gene expression requires binding of the rev trans-activator to its RNA target sequence , 1990, Cell.

[14]  M. Zuker,et al.  Common structures of the 5' non-coding RNA in enteroviruses and rhinoviruses. Thermodynamical stability and statistical significance. , 1990, Journal of molecular biology.

[15]  P. Luciw,et al.  Identification of the Rev transactivation and Rev-responsive elements of feline immunodeficiency virus , 1992, Journal of virology.

[16]  J. Maizel,et al.  Identification of unusual RNA folding patterns encoded by bacteriophage T4 gene 60 , 1993, Gene.

[17]  David J. States,et al.  Identification of protein coding regions by database similarity search , 1993, Nature Genetics.

[18]  M. Zuker Prediction of RNA secondary structure by energy minimization. , 1994, Methods in molecular biology.

[19]  E. Snyder,et al.  Identification of protein coding regions in genomic DNA. , 1995, Journal of molecular biology.

[20]  Forsdyke Conservation of Stem-Loop Potential in Introns of Snake Venom Phospholipase A2 Genes: An Application of FORS-D Analysis , 1995 .

[21]  M. Borodovsky,et al.  Detection of new genes in a bacterial genome using Markov models for three gene classes. , 1995, Nucleic acids research.

[22]  S Y Le,et al.  An RNA pseudoknot is an essential structural element of the internal ribosome entry site located within the hepatitis C virus 5' noncoding region. , 1995, RNA.

[23]  James D. Lawrey,et al.  Statistical Methods in Biology , 1996 .

[24]  P. Pevzner,et al.  Gene recognition via spliced sequence alignment. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Arcady R. Mushegian,et al.  Sequencing and analysis of bacterial genomes , 1996, Current Biology.

[26]  E. Uberbacher,et al.  Discovering and understanding genes in human DNA sequence using GRAIL. , 1996, Methods in enzymology.

[27]  Q. Jiang,et al.  Variability of gene order in different Helicobacter pylori strains contributes to genome diversity , 1996, Molecular microbiology.

[28]  V. Patzel,et al.  The hepatitis B virus posttranscriptional regulatory element contains a highly stable RNA secondary structure. , 1997, Biochemical and biophysical research communications.

[29]  K Frech,et al.  Software for the analysis of DNA sequence elements of transcription , 1997, Comput. Appl. Biosci..

[30]  G. Sczakiel The design of antisense RNA. , 1997, Antisense & nucleic acid drug development.

[31]  M A Andrade,et al.  Bioinformatics: from genome data to biological knowledge. , 1997, Current opinion in biotechnology.

[32]  Mark Borodovsky,et al.  The complete genome sequence of the gastric pathogen Helicobacter pylori , 1997, Nature.

[33]  J. SantaLucia,et al.  A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[34]  S. Karlin,et al.  Comparative DNA analysis across diverse genomes. , 1998, Annual review of genetics.

[35]  D L Brutlag,et al.  Genomics and computational molecular biology. , 1998, Current opinion in microbiology.

[36]  S. P. Walton,et al.  Prediction of antisense oligonucleotide binding affinity to a structured RNA target. , 1999, Biotechnology and bioengineering.

[37]  David W. Digby,et al.  mRNAs have greater negative folding free energies than shuffled or codon choice randomized sequences. , 1999, Nucleic acids research.

[38]  G. Olsen,et al.  CRITICA: coding region identification tool invoking comparative analysis. , 1999, Molecular biology and evolution.

[39]  P. Bucher,et al.  Regulatory elements and expression profiles. , 1999, Current opinion in structural biology.

[40]  Hanspeter Herzel,et al.  10-11 bp periodicities in complete genomes reflect protein structure and DNA folding , 1999, Bioinform..

[41]  D. Forsdyke,et al.  Accounting units in DNA. , 1999, Journal of theoretical biology.

[42]  M. Evans Statistical Distributions , 2000 .