FastR: fast database search tool for non-coding RNA

The discovery of novel non-coding RNAs has been among the most exciting recent developments in biology. Yet, many more remain undiscovered. It has been hypothesized that there is in fact an abundance of functional non-coding RNA (ncRNA) with various catalytic and regulatory junctions. Computational methods tailored specifically for ncRNA are being actively developed. As the inherent signal for ncRNA is weaker than that for protein coding genes, comparative methods offer the most promising approach, and are the subject of our research. We consider the following problem: Given an RNA sequence with a known secondary structure, efficiently compute all structural homologs (computed as a function of sequence and structural similarity) in a genomic database. Our approach, based on structural filters that eliminate a large portion of the database, while retaining the true homologs allows us to search a typical bacterial database in minutes on a standard PC, with high sensitivity and specificity. This is two orders of magnitude better than current available software for the problem.

[1]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[2]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[3]  David Haussler,et al.  Recent Methods for RNA Modeling Using Stochastic Context-Free Grammars , 1994, CPM.

[4]  C. Burge,et al.  The microRNAs of Caenorhabditis elegans. , 2003, Genes & development.

[5]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[6]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[7]  Elena Rivas,et al.  Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs , 2000, Bioinform..

[8]  V. Barnett,et al.  Probability and Statistics: Theory and Applications. , 1978 .

[9]  A. Krogh,et al.  No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution. , 1999, Nucleic acids research.

[10]  David Sankoff,et al.  RNA secondary structures and their prediction , 1984 .

[11]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[12]  V. Ambros,et al.  An Extensive Class of Small RNAs in Caenorhabditis elegans , 2001, Science.

[13]  D. Turner,et al.  Improved predictions of secondary structures for RNA. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[15]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[16]  Sean R. Eddy,et al.  RSEARCH: Finding homologs of single structured RNA sequences , 2003, BMC Bioinformatics.

[17]  Bruce A. Shapiro,et al.  A computational procedure for assessing the significance of RNA secondary structure , 1990, Comput. Appl. Biosci..

[18]  T. Speed,et al.  Biological Sequence Analysis , 1998 .

[19]  H. Margalit,et al.  Novel small RNA-encoding genes in the intergenic regions of Escherichia coli , 2001, Current Biology.

[20]  Maciej Szymanski,et al.  5S Ribosomal RNA Database , 2002, Nucleic Acids Res..

[21]  S. Eddy Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.

[22]  R. Ravi,et al.  Computing Similarity between RNA Strings , 1996, CPM.