Strategies for Homology-Based ncRNA Gene Annotation

Most non-coding RNAs are short and /or poorly conserved in sequence. Most of the longer examples , furthermore, consist of a collection of conserved structural motifs rath er at a coherent globally conserved secondary structure. As a consequence, the conceptually simple problem of homology search becomes a complex and technically demanding task. Despite the best e fforts of databases such as Rfam, the situation is complicated further by the sparsity of inf ormation on many — in particular prokaryotic — RNA families. In th is contribution we review recent e fforts to customize sequence-based search tools for ncRNA applications. In par ticul r semi-global alignments and the development of methods for fragmented pattern search have brought signific a t practical advances. Current developments in this area focus on the integration of fragmented sequence patter n search with search algorithms for secondary structure patterns. As one example, we introduce here fragrep3.

[1]  P. Stadler,et al.  The tedious task of finding homologous noncoding RNA genes. , 2009, RNA.

[2]  Sonja J. Prohaska,et al.  Evolution of vault RNAs. , 2009, Molecular biology and evolution.

[3]  J. Mrázek,et al.  Epstein-barr virus-induced expression of a novel human vault RNA. , 2009, Journal of molecular biology.

[4]  Peter F. Stadler,et al.  Non-coding RNA annotation of the genome of Trichoplax adhaerens , 2009, Nucleic acids research.

[5]  A. Bateman,et al.  A home for RNA families at RNA Biology , 2009 .

[6]  Robert D. Finn,et al.  Rfam: updates to the RNA families database , 2008, Nucleic Acids Res..

[7]  Toralf Kirsten,et al.  Evolution of Spliceosomal snRNA Genes in Metazoan Animals , 2008, Journal of Molecular Evolution.

[8]  Ari Löytynoja,et al.  A model of evolution and structure for multiple sequence alignment , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[9]  P. Stadler,et al.  Arthropod 7SK RNA. , 2008, Molecular biology and evolution.

[10]  Peter F. Stadler,et al.  Small ncRNA transcriptome analysis from Aspergillus fumigatus suggests a novel mechanism for regulation of protein synthesis , 2008, Nucleic acids research.

[11]  S. Brenner,et al.  Elephant shark sequence reveals unique insights into the evolutionary history of vertebrate genes: A comparative analysis of the protocadherin cluster , 2008, Proceedings of the National Academy of Sciences.

[12]  Peter F. Stadler,et al.  SnoReport: computational identification of snoRNAs with unknown targets , 2008, Bioinform..

[13]  Xiaodong Qi,et al.  The Telomerase Database , 2007, Nucleic Acids Res..

[14]  Stijn van Dongen,et al.  miRBase: tools for microRNA genomics , 2007, Nucleic Acids Res..

[15]  Rolf Backofen,et al.  Variations on RNA folding and alignment: lessons from Benasque , 2007, Journal of mathematical biology.

[16]  Peter F. Stadler,et al.  U7 snRNAs: A Computational Survey , 2008, Genom. Proteom. Bioinform..

[17]  P. Stadler,et al.  Invertebrate 7SK snRNAs , 2008, Journal of Molecular Evolution.

[18]  Peter F. Stadler,et al.  Homology Search with Fragmented Nucleic Acid Sequence Patterns , 2007, WABI.

[19]  P. Stadler,et al.  RNase MRP and the RNA processing cascade in the eukaryotic ancestor , 2007, BMC Evolutionary Biology.

[20]  Sean R. Eddy,et al.  Query-Dependent Banding (QDB) for Faster RNA Similarity Searches , 2007, PLoS Comput. Biol..

[21]  Jonathan P. Bollback,et al.  Exploring genomic dark matter: a critical assessment of the performance of homology search methods on noncoding RNA. , 2006, Genome research.

[22]  Valer Gotea,et al.  Spliceosomal small nuclear RNA genes in 11 insect genomes. , 2006, RNA.

[23]  Christian Zwieb,et al.  The tmRDB and SRPDB resources , 2005, Nucleic Acids Res..

[24]  Laurent Lestrade,et al.  snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs , 2005, Nucleic Acids Res..

[25]  Zasha Weinberg,et al.  Sequence-based heuristics for faster annotation of non-coding RNA families , 2006, Bioinform..

[26]  Peter F Stadler,et al.  Fast and reliable prediction of noncoding RNAs , 2005, Proc. Natl. Acad. Sci. USA.

[27]  Sam Griffiths-Jones,et al.  RALEE--RNA ALignment Editor in Emacs , 2005, Bioinform..

[28]  Wayne A. Decatur,et al.  Genome-wide searching for pseudouridylation guide snoRNAs: analysis of the Saccharomyces cerevisiae genome. , 2004, Nucleic acids research.

[29]  Sean R. Eddy,et al.  RSEARCH: Finding homologs of single structured RNA sequences , 2003, BMC Bioinformatics.

[30]  E. Wingender,et al.  MATCH: A tool for searching transcription factor binding sites in DNA sequences. , 2003, Nucleic acids research.

[31]  Amos Bairoch,et al.  PROSITE: A Documented Database Using Patterns and Profiles as Motif Descriptors , 2002, Briefings Bioinform..

[32]  Sean R. Eddy,et al.  A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure , 2002, BMC Bioinformatics.

[33]  R. Terns,et al.  The snoRNA domain of vertebrate telomerase RNA functions to localize the RNA within the nucleus. , 2001, RNA.

[34]  D. Ecker,et al.  RNAMotif, an RNA secondary structure definition and search algorithm. , 2001, Nucleic acids research.

[35]  D. Gautheret,et al.  Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles. , 2001, Journal of molecular biology.

[36]  N. Hernandez,et al.  Small Nuclear RNA Genes: a Model System to Study Fundamental Mechanisms of Transcription* , 2001, The Journal of Biological Chemistry.

[37]  Jiunn-Liang Chen,et al.  Secondary Structure of Vertebrate Telomerase RNA , 2000, Cell.

[38]  W. Pearson Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. , 1991, Genomics.

[39]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.