Tree decomposition based fast search of RNA structures including pseudoknots in genomes

Searching genomes for RNA secondary structure with computational methods has become an important approach to the annotation of non-coding RNAs. However, due to the lack of efficient algorithms for accurate RNA structure-sequence alignment, computer programs capable of fast and effectively searching genomes for RNA secondary structures have not been available. In this paper, a novel RNA structure profiling model is introduced based on the notion of a conformational graph to specify the consensus structure of an RNA family. Tree decomposition yields a small tree width t for such conformation graphs (e.g., t=2 for stem loops and only a slight increase for pseudo-knots). Within this modelling framework, the optimal alignment of a sequence to the structure model corresponds to finding a maximum valued isomorphic subgraph and consequently can be accomplished through dynamic programming on the tree decomposition of the conformational graph in time O(k/sup t/N/sup 2/), where k is a small parameter, and N is the size of the profiled RNA structure. Experiments show that the application of the alignment algorithm to search in genomes yields the same search accuracy as methods based on a covariance model with a significant reduction in computation time. In particular, very accurate searches of tmRNAs in bacteria genomes and of telomerase RNAs in yeast genomes can be accomplished in days, as opposed to months required by other methods. The tree decomposition based searching tool is free upon request and can be downloaded at our site http://www.uga.edu/RNA-Informatics/software/index.php.

[1]  Sean R. Eddy,et al.  RSEARCH: Finding homologs of single structured RNA sequences , 2003, BMC Bioinformatics.

[2]  Russell L. Malmberg,et al.  Stochastic modeling of RNA pseudoknotted structures: a grammatical approach , 2003, ISMB.

[3]  Stefan Arnborg,et al.  Linear time algorithms for NP-hard problems restricted to partial k-trees , 1989, Discret. Appl. Math..

[4]  J. F. Atkins,et al.  Functional and structural analysis of a pseudoknot upstream of the tag-encoded sequence in E. coli tmRNA. , 1999, Journal of molecular biology.

[5]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[6]  D. Gautheret,et al.  Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles. , 2001, Journal of molecular biology.

[7]  Satoshi Kobayashi,et al.  Tree Adjoining Grammars for RNA Structure Prediction , 1999, Theor. Comput. Sci..

[8]  David Eppstein,et al.  The Polyhedral Approach to the Maximum Planar Subgraph Problem: New Chances for Related Problems , 1994, GD.

[9]  Qiang Zhou,et al.  The 7SK small nuclear RNA inhibits the CDK9/cyclin T1 kinase to control transcription , 2001, Nature.

[10]  N. Pace,et al.  Ribonuclease P: unity and diversity in a tRNA processing ribozyme. , 1998, Annual review of biochemistry.

[11]  Paul D. Seymour,et al.  Graph Minors. II. Algorithmic Aspects of Tree-Width , 1986, J. Algorithms.

[12]  Russell L. Malmberg,et al.  Profiling and Searching for RNA Pseudoknot Structures in Genomes , 2005, International Conference on Computational Science.

[13]  Elena Rivas,et al.  The language of RNA: a formal grammar that includes pseudoknots , 2000, Bioinform..

[14]  Robin Thomas,et al.  On the complexity of finding iso- and other morphisms for partial k-trees , 1992, Discret. Math..

[15]  S. Goebel,et al.  Characterization of the RNA Components of a Putative Molecular Switch in the 3′ Untranslated Region of the Murine Coronavirus Genome , 2004, Journal of Virology.

[16]  Tamás Kiss,et al.  7SK small nuclear RNA binds to and inhibits the activity of CDK9/cyclin T complexes , 2001, Nature.

[17]  Hans L. Bodlaender,et al.  Some Classes of Graphs with Bounded Treewidth , 1988, Bull. EATCS.

[18]  Elena Rivas,et al.  Noncoding RNA gene detection using comparative sequence analysis , 2001, BMC Bioinformatics.

[19]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[20]  M Brown,et al.  RNA pseudoknot modeling using intersections of stochastic context free grammars with applications to database search. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[21]  Zasha Weinberg,et al.  Faster genome annotation of non-coding RNA families without loss of accuracy , 2004, RECOMB.

[22]  Vineet Bafna,et al.  FastR: fast database search tool for non-coding RNA , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[23]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[24]  S. Eddy,et al.  Computational identification of noncoding RNAs in E. coli by comparative genomics , 2001, Current Biology.

[25]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[26]  Sherif Abou Elela,et al.  A Phylogenetically Based Secondary Structure for the Yeast Telomerase RNA , 2004, Current Biology.

[27]  E Rivas,et al.  A dynamic programming algorithm for RNA structure prediction including pseudoknots. , 1998, Journal of molecular biology.