Fragrep: An Efficient Search Tool for Fragmented Patterns in Genomic Sequences

Many classes of non-coding RNAs (ncRNAs; including Y RNAs, vault RNAs, RNase P RNAs, and MRP RNAs, as well as a novel class recently discovered in Dictyostelium discoideum) can be characterized by a pattern of short but well-conserved sequence elements that are separated by poorly conserved regions of sometimes highly variable lengths. Local alignment algorithms such as BLAST are therefore ill-suited for the discovery of new homologs of such ncRNAs in genomic sequences. The Fragrep tool instead implements an efficient algorithm for detecting the pattern fragments that occur in a given order. For each pattern fragment, the mismatch tolerance and bounds on the length of the intervening sequences can be specified separately. Furthermore, matches can be ranked by a statistically well-motivated scoring scheme.

[1]  G. Pruijn,et al.  Conserved features of Y RNAs: a comparison of experimentally derived secondary structures. , 2000, Nucleic acids research.

[2]  Roded Sharan,et al.  A discriminative model for identifying spatial cis-regulatory modules , 2004, J. Comput. Biol..

[3]  Eric M. Just,et al.  dictyBase: a new Dictyostelium discoideum genome database , 2004, Nucleic Acids Res..

[4]  V. Kickhoefer,et al.  Identification of conserved vault RNA expression elements and a non-expressed mouse vault RNA gene. , 2003, Gene.

[5]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[6]  Pontus Larsson,et al.  Novel non-coding RNAs in Dictyostelium discoideum and their expression during development. , 2004, Nucleic acids research.

[7]  D. Gautheret,et al.  Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles. , 2001, Journal of molecular biology.

[8]  P. Rogan,et al.  Bipartite pattern discovery by entropy minimization-based multiple local alignment. , 2004, Nucleic acids research.

[9]  Sean R. Eddy,et al.  A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure , 2002, BMC Bioinformatics.

[10]  G. Pruijn,et al.  Conserved features of Y RNAs revealed by automated phylogenetic secondary structure analysis. , 1999, Nucleic acids research.

[11]  Shane T. Jensen,et al.  BioOptimizer: a Bayesian scoring function approach to motif discovery , 2004, Bioinform..

[12]  D. Ecker,et al.  RNAMotif, an RNA secondary structure definition and search algorithm. , 2001, Nucleic acids research.