Identification and Classification of Conserved RNA Secondary Structures in the Human Genome

The discoveries of microRNAs and riboswitches, among others, have shown functional RNAs to be biologically more important and genomically more prevalent than previously anticipated. We have developed a general comparative genomics method based on phylogenetic stochastic context-free grammars for identifying functional RNAs encoded in the human genome and used it to survey an eight-way genome-wide alignment of the human, chimpanzee, mouse, rat, dog, chicken, zebra-fish, and puffer-fish genomes for deeply conserved functional RNAs. At a loose threshold for acceptance, this search resulted in a set of 48,479 candidate RNA structures. This screen finds a large number of known functional RNAs, including 195 miRNAs, 62 histone 3′UTR stem loops, and various types of known genetic recoding elements. Among the highest-scoring new predictions are 169 new miRNA candidates, as well as new candidate selenocysteine insertion sites, RNA editing hairpins, RNAs involved in transcript auto regulation, and many folds that form singletons or small functional RNA families of completely unknown function. While the rate of false positives in the overall set is difficult to estimate and is likely to be substantial, the results nevertheless provide evidence for many new human functional RNAs and present specific predictions to facilitate their further characterization.

[1]  Laurent Lestrade,et al.  snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs , 2005, Nucleic Acids Res..

[2]  P. Stadler,et al.  Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome , 2005, Nature Biotechnology.

[3]  Jean L. Chang,et al.  Initial sequence of the chimpanzee genome and comparison with the human genome , 2005, Nature.

[4]  T. Hughes,et al.  A systematic search for new mammalian noncoding RNAs indicates little conserved intergenic transcription , 2005, BMC Genomics.

[5]  D. Haussler,et al.  Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. , 2005, Genome research.

[6]  M. T. Howard,et al.  Recoding elements located adjacent to a subset of eukaryal selenocysteine‐specifying UGA codons , 2005, The EMBO journal.

[7]  Sonja J. Prohaska,et al.  Evolutionary patterns of non-coding RNAs , 2005, Theory in Biosciences.

[8]  J. Mattick,et al.  Small regulatory RNAs in mammals. , 2005, Human molecular genetics.

[9]  K. Lindblad-Toh,et al.  Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals , 2005, Nature.

[10]  Peter F Stadler,et al.  Fast and reliable prediction of noncoding RNAs , 2005, Proc. Natl. Acad. Sci. USA.

[11]  Eugene Berezikov,et al.  Phylogenetic Shadowing and Computational Identification of Human microRNA Genes , 2005, Cell.

[12]  S. Eddy A Model of the Statistical Power of Comparative Genome Sequence Analysis , 2005, PLoS biology.

[13]  Sean R. Eddy,et al.  Rfam: annotating non-coding RNAs in complete genomes , 2004, Nucleic Acids Res..

[14]  Colin N. Dewey,et al.  Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution , 2004, Nature.

[15]  G. Hannon,et al.  Processing of primary microRNAs by the Microprocessor complex , 2004, Nature.

[16]  R. Shiekhattar,et al.  The Microprocessor complex mediates the genesis of microRNAs , 2004, Nature.

[17]  E. Lander,et al.  Finishing the euchromatic sequence of the human genome , 2004 .

[18]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[19]  I. Hofacker,et al.  Consensus folding of aligned sequences as a new measure for the detection of functional RNAs by comparative genomics. , 2004, Journal of molecular biology.

[20]  B. Berger,et al.  MSARI: multiple sequence alignments for statistical detection of RNA secondary structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[21]  D. Haussler,et al.  Aligning multiple genomic sequences with the threaded blockset aligner. , 2004, Genome research.

[22]  Lisa M. D'Souza,et al.  Genome sequence of the Brown Norway rat yields insights into mammalian evolution , 2004, Nature.

[23]  T. Dawson,et al.  Structure and Sequence Determinants Required for the RNA Editing of ADAR2 Substrates* , 2004, Journal of Biological Chemistry.

[24]  O. Namy,et al.  Reprogrammed genetic decoding in cellular gene expression. , 2004, Molecular cell.

[25]  D. Haussler,et al.  Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Diego di Bernardo,et al.  ddbRNA: detection of conserved secondary structures in multiple alignments , 2003, Bioinform..

[27]  A. Källman,et al.  ADAR2 A-->I editing: site selectivity and editing efficiency are separate events. , 2003, Nucleic acids research.

[28]  S. Eddy,et al.  Computational identification of non-coding RNAs in Saccharomyces cerevisiae by comparative genomics. , 2003, Nucleic acids research.

[29]  J. Brosius The Contribution of RNAs and Retroposition to Evolutionary Novelties , 2003, Genetica.

[30]  J. Hein,et al.  Pfold: RNA secondary structure prediction using stochastic context-free grammars , 2003, Nucleic Acids Res..

[31]  R. Guigó,et al.  Characterization of Mammalian Selenoproteomes , 2003, Science.

[32]  S. Minoshima,et al.  Molecular cloning and expression analysis of a novel gene DGCR8 located in the DiGeorge syndrome chromosomal region. , 2003, Biochemical and biophysical research communications.

[33]  C. Burge,et al.  The microRNAs of Caenorhabditis elegans. , 2003, Genes & development.

[34]  P. Green,et al.  Transcription-associated mutational asymmetry in mammalian evolution , 2003, Nature Genetics.

[35]  D. Brenner,et al.  5′ Stem-Loop of Collagen α1(I) mRNA Inhibits Translationin Vitro but Is Required for Triple Helical Collagen Synthesis in Vivo* , 2003, The Journal of Biological Chemistry.

[36]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[37]  Paramvir S. Dehal,et al.  Whole-Genome Shotgun Assembly and Analysis of the Genome of Fugu rubripes , 2002, Science.

[38]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[39]  T. Tuschl,et al.  Identification of Tissue-Specific MicroRNAs from Mouse , 2002, Current Biology.

[40]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[41]  S. Eddy Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.

[42]  Elena Rivas,et al.  Noncoding RNA gene detection using comparative sequence analysis , 2001, BMC Bioinformatics.

[43]  P. Chambon,et al.  NSD3, a new SET domain-containing gene, maps to 8p12 and is amplified in human breast cancer cell lines. , 2001, Genomics.

[44]  K. A. Lehmann,et al.  Double-stranded RNA adenosine deaminases ADAR1 and ADAR2 have overlapping specificities. , 2000, Biochemistry.

[45]  D. Feldmeyer,et al.  Point mutation in an AMPA receptor gene rescues lethality in mice deficient in the RNA-editing enzyme ADAR2 , 2000, Nature.

[46]  Elena Rivas,et al.  Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs , 2000, Bioinform..

[47]  Bjarne Knudsen,et al.  RNA secondary structure prediction using stochastic context-free grammars and evolutionary history , 1999, Bioinform..

[48]  P. Pahl,et al.  ZNF207, a ubiquitously expressed zinc finger gene on chromosome 6p21.3. , 1998, Genomics.

[49]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[50]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[51]  David C. Jones,et al.  Combining protein evolution and secondary structure. , 1996, Molecular biology and evolution.

[52]  Z. Yang,et al.  A space-time process model for the evolution of DNA sequences. , 1995, Genetics.

[53]  J. F. Atkins,et al.  Autoregulatory frameshifting in decoding mammalian ornithine decarboxylase antizyme , 1995, Cell.

[54]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[55]  D. Haussler,et al.  Stochastic context-free grammars for modeling RNA , 1993, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[56]  M. Berry,et al.  Recognition of UGA as a selenocysteine codon in Type I deiodinase requires sequences in the 3′ untranslated region , 1991, Nature.

[57]  A. Böck,et al.  Features of the formate dehydrogenase mRNA necessary for decoding of the UGA codon as selenocysteine. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[58]  H. Noller,et al.  Secondary structure of 16S ribosomal RNA. , 1981, Science.

[59]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[60]  R. Aharonov,et al.  Identification of hundreds of conserved and nonconserved human microRNAs , 2005, Nature Genetics.

[61]  Sam Griffiths-Jones,et al.  The microRNA Registry , 2004, Nucleic Acids Res..

[62]  David Haussler,et al.  Into the heart of darkness: large-scale clustering of human non-coding DNA , 2004, ISMB/ECCB.

[63]  Irmtraud M. Meyer,et al.  A comparative method for finding and folding RNA secondary structures within protein-coding regions. , 2004, Nucleic acids research.

[64]  Rat Genome Sequencing Project Consortium Genome sequence of the Brown Norway rat yields insights into mammalian evolution , 2004 .

[65]  International Human Genome Sequencing Consortium Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution , 2004 .

[66]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[67]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[68]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[69]  E. Lundelius Mammalian evolution. , 1994, Science.

[70]  I. Lapidus,et al.  Secondary structure of 5 S ribosomal RNA. , 1970, Journal of theoretical biology.

[71]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[72]  H. Munro,et al.  Mammalian protein metabolism , 1964 .