Identification and Classification of Conserved RNA Secondary Structures in the Human Genome

The discoveries of microRNAs and riboswitches, among others, have shown functional RNAs to be biologically more important and genomically more prevalent than previously anticipated. We have developed a general comparative genomics method based on phylogenetic stochastic context-free grammars for identifying functional RNAs encoded in the human genome and used it to survey an eight-way genome-wide alignment of the human, chimpanzee, mouse, rat, dog, chicken, zebra-fish, and puffer-fish genomes for deeply conserved functional RNAs. At a loose threshold for acceptance, this search resulted in a set of 48,479 candidate RNA structures. This screen finds a large number of known functional RNAs, including 195 miRNAs, 62 histone 3′UTR stem loops, and various types of known genetic recoding elements. Among the highest-scoring new predictions are 169 new miRNA candidates, as well as new candidate selenocysteine insertion sites, RNA editing hairpins, RNAs involved in transcript auto regulation, and many folds that form singletons or small functional RNA families of completely unknown function. While the rate of false positives in the overall set is difficult to estimate and is likely to be substantial, the results nevertheless provide evidence for many new human functional RNAs and present specific predictions to facilitate their further characterization.

[1]  Noam Chomsky,et al.  On Certain Formal Properties of Grammars , 1959, Inf. Control..

[2]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[3]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[4]  I. Lapidus,et al.  Secondary structure of 5 S ribosomal RNA. , 1970, Journal of theoretical biology.

[5]  H. Noller,et al.  Secondary structure of 16S ribosomal RNA. , 1981, Science.

[6]  A. Böck,et al.  Features of the formate dehydrogenase mRNA necessary for decoding of the UGA codon as selenocysteine. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[7]  M. Berry,et al.  Recognition of UGA as a selenocysteine codon in Type I deiodinase requires sequences in the 3′ untranslated region , 1991, Nature.

[8]  D. Haussler,et al.  Stochastic context-free grammars for modeling RNA , 1993, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[9]  Juan C. Meza,et al.  OPT++: An object-oriented class library for nonlinear optimization , 1994 .

[10]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[11]  J. F. Atkins,et al.  Autoregulatory frameshifting in decoding mammalian ornithine decarboxylase antizyme , 1995, Cell.

[12]  Z. Yang,et al.  A space-time process model for the evolution of DNA sequences. , 1995, Genetics.

[13]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[14]  David C. Jones,et al.  Combining protein evolution and secondary structure. , 1996, Molecular biology and evolution.

[15]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[16]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[17]  Z. Yang,et al.  Models of amino acid substitution and applications to mitochondrial protein evolution. , 1998, Molecular biology and evolution.

[18]  P. Pahl,et al.  ZNF207, a ubiquitously expressed zinc finger gene on chromosome 6p21.3. , 1998, Genomics.

[19]  Bjarne Knudsen,et al.  RNA secondary structure prediction using stochastic context-free grammars and evolutionary history , 1999, Bioinform..

[20]  D. Feldmeyer,et al.  Point mutation in an AMPA receptor gene rescues lethality in mice deficient in the RNA-editing enzyme ADAR2 , 2000, Nature.

[21]  Elena Rivas,et al.  Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs , 2000, Bioinform..

[22]  K. A. Lehmann,et al.  Double-stranded RNA adenosine deaminases ADAR1 and ADAR2 have overlapping specificities. , 2000, Biochemistry.

[23]  Elena Rivas,et al.  Noncoding RNA gene detection using comparative sequence analysis , 2001, BMC Bioinformatics.

[24]  S. Eddy Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.

[25]  D. Hoyle,et al.  RNA sequence evolution with secondary structure constraints: comparison of substitution rate models using maximum-likelihood methods. , 2001, Genetics.

[26]  P. Chambon,et al.  NSD3, a new SET domain-containing gene, maps to 8p12 and is amplified in human breast cancer cell lines. , 2001, Genomics.

[27]  T. Tuschl,et al.  Identification of Tissue-Specific MicroRNAs from Mouse , 2002, Current Biology.

[28]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[29]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[30]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[31]  Paramvir S. Dehal,et al.  Whole-Genome Shotgun Assembly and Analysis of the Genome of Fugu rubripes , 2002, Science.

[32]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[33]  D. Haussler,et al.  Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[34]  A. Källman,et al.  ADAR2 A-->I editing: site selectivity and editing efficiency are separate events. , 2003, Nucleic acids research.

[35]  Diego di Bernardo,et al.  ddbRNA: detection of conserved secondary structures in multiple alignments , 2003, Bioinform..

[36]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[37]  Jon D. McAuliffe,et al.  Phylogenetic Shadowing of Primate Sequences to Find Functional Regions of the Human Genome , 2003, Science.

[38]  Chuong B. Do,et al.  Access the most recent version at doi: 10.1101/gr.926603 References , 2003 .

[39]  C. Burge,et al.  The microRNAs of Caenorhabditis elegans. , 2003, Genes & development.

[40]  R. Guigó,et al.  Characterization of Mammalian Selenoproteomes , 2003, Science.

[41]  S. Minoshima,et al.  Molecular cloning and expression analysis of a novel gene DGCR8 located in the DiGeorge syndrome chromosomal region. , 2003, Biochemical and biophysical research communications.

[42]  C. Burge,et al.  Vertebrate MicroRNA Genes , 2003, Science.

[43]  D. Brenner,et al.  5′ Stem-Loop of Collagen α1(I) mRNA Inhibits Translationin Vitro but Is Required for Triple Helical Collagen Synthesis in Vivo* , 2003, The Journal of Biological Chemistry.

[44]  Jakob Skou Pedersen,et al.  Gene finding with a hidden Markov model of genome structure and evolution , 2003, Bioinform..

[45]  S. Eddy,et al.  Computational identification of non-coding RNAs in Saccharomyces cerevisiae by comparative genomics. , 2003, Nucleic acids research.

[46]  Bjarne Knudsen,et al.  Pfold: RNA Secondary Structure Prediction Using Stochastic Context-Free Grammars , 2003 .

[47]  P. Green,et al.  Transcription-associated mutational asymmetry in mammalian evolution , 2003, Nature Genetics.

[48]  O. Namy,et al.  Reprogrammed genetic decoding in cellular gene expression. , 2004, Molecular cell.

[49]  International Human Genome Sequencing Consortium Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution , 2004 .

[50]  Sam Griffiths-Jones,et al.  The microRNA Registry , 2004, Nucleic Acids Res..

[51]  J. Brosius The Contribution of RNAs and Retroposition to Evolutionary Novelties , 2003, Genetica.

[52]  E. Lander,et al.  Finishing the euchromatic sequence of the human genome , 2004 .

[53]  Irmtraud M. Meyer,et al.  An evolutionary model for protein-coding regions with conserved RNA structure. , 2004, Molecular biology and evolution.

[54]  R. Shiekhattar,et al.  The Microprocessor complex mediates the genesis of microRNAs , 2004, Nature.

[55]  Lisa M. D'Souza,et al.  Genome sequence of the Brown Norway rat yields insights into mammalian evolution , 2004, Nature.

[56]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[57]  Colin N. Dewey,et al.  Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution , 2004, Nature.

[58]  B. Berger,et al.  MSARI: multiple sequence alignments for statistical detection of RNA secondary structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[59]  T. Dawson,et al.  Structure and Sequence Determinants Required for the RNA Editing of ADAR2 Substrates* , 2004, Journal of Biological Chemistry.

[60]  G. Hannon,et al.  Processing of primary microRNAs by the Microprocessor complex , 2004, Nature.

[61]  Irmtraud M. Meyer,et al.  A comparative method for finding and folding RNA secondary structures within protein-coding regions. , 2004, Nucleic acids research.

[62]  D. Haussler,et al.  Aligning multiple genomic sequences with the threaded blockset aligner. , 2004, Genome research.

[63]  David Haussler,et al.  Into the heart of darkness: large-scale clustering of human non-coding DNA , 2004, ISMB/ECCB.

[64]  Rat Genome Sequencing Project Consortium Genome sequence of the Brown Norway rat yields insights into mammalian evolution , 2004 .

[65]  I. Hofacker,et al.  Consensus folding of aligned sequences as a new measure for the detection of functional RNAs by comparative genomics. , 2004, Journal of molecular biology.

[66]  International Human Genome Sequencing Consortium Finishing the euchromatic sequence of the human genome , 2004 .

[67]  K. Lindblad-Toh,et al.  Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals , 2005, Nature.

[68]  Eugene Berezikov,et al.  Phylogenetic Shadowing and Computational Identification of Human microRNA Genes , 2005, Cell.

[69]  M. T. Howard,et al.  Recoding elements located adjacent to a subset of eukaryal selenocysteine‐specifying UGA codons , 2005, The EMBO journal.

[70]  S. Eddy A Model of the Statistical Power of Comparative Genome Sequence Analysis , 2005, PLoS biology.

[71]  Peter F Stadler,et al.  Fast and reliable prediction of noncoding RNAs , 2005, Proc. Natl. Acad. Sci. USA.

[72]  J. Mattick,et al.  Small regulatory RNAs in mammals. , 2005, Human molecular genetics.

[73]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[74]  Jean L. Chang,et al.  Initial sequence of the chimpanzee genome and comparison with the human genome , 2005, Nature.

[75]  T. Hughes,et al.  A systematic search for new mammalian noncoding RNAs indicates little conserved intergenic transcription , 2005, BMC Genomics.

[76]  P. Stadler,et al.  Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome , 2005, Nature Biotechnology.

[77]  Sean R. Eddy,et al.  Rfam: annotating non-coding RNAs in complete genomes , 2004, Nucleic Acids Res..

[78]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[79]  R. Aharonov,et al.  Identification of hundreds of conserved and nonconserved human microRNAs , 2005, Nature Genetics.

[80]  Laurent Lestrade,et al.  snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs , 2005, Nucleic Acids Res..

[81]  Sonja J. Prohaska,et al.  Evolutionary patterns of non-coding RNAs , 2005, Theory in Biosciences.