Customized strategies for discovering distant ncRNA homologs.

A large fraction of non-coding RNAs is short and/or poorly conserved in sequence. Most of the longer examples, furthermore, consist of a collection of conserved structural motifs rather than a coherent globally conserved secondary structure. As a consequence, the conceptually simple problem of homology search becomes a complex and technically demanding task. Despite the best efforts of databases such as Rfam, the situation is complicated further by the sparsity of information in many--in particular prokaryotic--RNA families. In this contribution, we review recent efforts to customize sequence-based search tools for ncRNA applications. In particular, semi-global alignments and the development of methods for fragmented pattern search have brought significant practical advances. Current developments in this area focus on the integration of fragmented sequence pattern search with search algorithms for secondary structure patterns. We focus here, in particular, on strategies that can be successful in the 'twilight zone' where generic approaches from blast to infernal to start to fail.

[1]  P. Stadler,et al.  The tedious task of finding homologous noncoding RNA genes. , 2009, RNA.

[2]  Sonja J. Prohaska,et al.  Evolution of vault RNAs. , 2009, Molecular biology and evolution.

[3]  J. Mrázek,et al.  Epstein-barr virus-induced expression of a novel human vault RNA. , 2009, Journal of molecular biology.

[4]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[5]  Peter F. Stadler,et al.  Non-coding RNA annotation of the genome of Trichoplax adhaerens , 2009, Nucleic acids research.

[6]  A. Bateman,et al.  A home for RNA families at RNA Biology , 2009 .

[7]  Robert D. Finn,et al.  Rfam: updates to the RNA families database , 2008, Nucleic Acids Res..

[8]  Peter F. Stadler,et al.  Maximum Likelihood Estimation of Weight Matrices for Targeted Homology Search , 2009, GCB.

[9]  Toralf Kirsten,et al.  Evolution of Spliceosomal snRNA Genes in Metazoan Animals , 2008, Journal of Molecular Evolution.

[10]  Ari Löytynoja,et al.  A model of evolution and structure for multiple sequence alignment , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[11]  P. Stadler,et al.  Arthropod 7SK RNA. , 2008, Molecular biology and evolution.

[12]  Peter F. Stadler,et al.  Small ncRNA transcriptome analysis from Aspergillus fumigatus suggests a novel mechanism for regulation of protein synthesis , 2008, Nucleic acids research.

[13]  S. Brenner,et al.  Elephant shark sequence reveals unique insights into the evolutionary history of vertebrate genes: A comparative analysis of the protocadherin cluster , 2008, Proceedings of the National Academy of Sciences.

[14]  Axel Mosig,et al.  Structure and Function of the Smallest Vertebrate Telomerase RNA from Teleost Fish* , 2008, Journal of Biological Chemistry.

[15]  Peter F. Stadler,et al.  SnoReport: computational identification of snoRNAs with unknown targets , 2008, Bioinform..

[16]  Xiaodong Qi,et al.  The Telomerase Database , 2007, Nucleic Acids Res..

[17]  Stijn van Dongen,et al.  miRBase: tools for microRNA genomics , 2007, Nucleic Acids Res..

[18]  Peter F. Stadler,et al.  U7 snRNAs: A Computational Survey , 2008, Genom. Proteom. Bioinform..

[19]  P. Stadler,et al.  Invertebrate 7SK snRNAs , 2008, Journal of Molecular Evolution.

[20]  Peter F. Stadler,et al.  Homology Search with Fragmented Nucleic Acid Sequence Patterns , 2007, WABI.

[21]  P. Stadler,et al.  RNase MRP and the RNA processing cascade in the eukaryotic ancestor , 2007, BMC Evolutionary Biology.

[22]  Sean R. Eddy,et al.  Query-Dependent Banding (QDB) for Faster RNA Similarity Searches , 2007, PLoS Comput. Biol..

[23]  Jonathan P. Bollback,et al.  Exploring genomic dark matter: a critical assessment of the performance of homology search methods on noncoding RNA. , 2006, Genome research.

[24]  Valer Gotea,et al.  Spliceosomal small nuclear RNA genes in 11 insect genomes. , 2006, RNA.

[25]  Laurent Lestrade,et al.  snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs , 2005, Nucleic Acids Res..

[26]  Christian Zwieb,et al.  The tmRDB and SRPDB resources , 2005, Nucleic Acids Res..

[27]  Zasha Weinberg,et al.  Sequence-based heuristics for faster annotation of non-coding RNA families , 2006, Bioinform..

[28]  Peter F Stadler,et al.  Fast and reliable prediction of noncoding RNAs , 2005, Proc. Natl. Acad. Sci. USA.

[29]  Sam Griffiths-Jones,et al.  RALEE--RNA ALignment Editor in Emacs , 2005, Bioinform..

[30]  Wayne A. Decatur,et al.  Genome-wide searching for pseudouridylation guide snoRNAs: analysis of the Saccharomyces cerevisiae genome. , 2004, Nucleic acids research.

[31]  Sean R. Eddy,et al.  RSEARCH: Finding homologs of single structured RNA sequences , 2003, BMC Bioinformatics.

[32]  E. Wingender,et al.  MATCH: A tool for searching transcription factor binding sites in DNA sequences. , 2003, Nucleic acids research.

[33]  Amos Bairoch,et al.  PROSITE: A Documented Database Using Patterns and Profiles as Motif Descriptors , 2002, Briefings Bioinform..

[34]  Sean R. Eddy,et al.  A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure , 2002, BMC Bioinformatics.

[35]  Alexander E. Kel,et al.  MATCHTM: a tool for searching transcription factor binding sites in DNA sequences , 2003, Nucleic Acids Res..

[36]  R. Terns,et al.  The snoRNA domain of vertebrate telomerase RNA functions to localize the RNA within the nucleus. , 2001, RNA.

[37]  D. Ecker,et al.  RNAMotif, an RNA secondary structure definition and search algorithm. , 2001, Nucleic acids research.

[38]  D. Gautheret,et al.  Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles. , 2001, Journal of molecular biology.

[39]  N. Hernandez,et al.  Small Nuclear RNA Genes: a Model System to Study Fundamental Mechanisms of Transcription* , 2001, The Journal of Biological Chemistry.

[40]  Jiunn-Liang Chen,et al.  Secondary Structure of Vertebrate Telomerase RNA , 2000, Cell.

[41]  W. Pearson Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. , 1991, Genomics.

[42]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.