Memory efficient alignment between RNA sequences and stochastic grammar models of pseudoknots

Stochastic Context-Free Grammars (SCFG) has been shown to be effective in modelling RNA secondary structure for searches. Our previous work (Cai et al., 2003) in Stochastic Parallel Communicating Grammar Systems (SPCGS) has extended SCFG to model RNA pseudoknots. However, the alignment algorithm requires O(n4) memory for a sequence of length n. In this paper, we develop a memory efficient algorithm for sequence-structure alignments including pseudoknots. This new algorithm reduces the memory space requirement from O(n4) to O(n2) without increasing the computation time. Our experiments have shown that this novel approach can achieve excellent performance on searching for RNA pseudoknots.

[1]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[2]  Ian Holmes,et al.  Pairwise RNA Structure Comparison with Stochastic Context-Free Grammars , 2001, Pacific Symposium on Biocomputing.

[3]  Russell L. Malmberg,et al.  RNA Structural Homology Search with a Succinct Stochastic Grammar Model , 2005, Journal of Computer Science and Technology.

[4]  E Westhof,et al.  Phylogenetic analysis of tmRNA genes within a bacterial subgroup reveals a specific structural signature. , 2001, Nucleic acids research.

[5]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[6]  David Haussler,et al.  Recent Methods for RNA Modeling Using Stochastic Context-Free Grammars , 1994, CPM.

[7]  C. W. Hilbers,et al.  NMR structure of a classical pseudoknot: interplay of single- and double-stranded RNA. , 1998, Science.

[8]  E Rivas,et al.  A dynamic programming algorithm for RNA structure prediction including pseudoknots. , 1998, Journal of molecular biology.

[9]  Weixiong Zhang,et al.  An Iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots , 2004, Bioinform..

[10]  Sean R. Eddy,et al.  RSEARCH: Finding homologs of single structured RNA sequences , 2003, BMC Bioinformatics.

[11]  Michael P. S. Brown,et al.  Small Subunit Ribosomal RNA Modeling Using Stochastic Context-Free Grammars , 2000, ISMB.

[12]  Liming Cai,et al.  The Computational Complexity of PCGS with Regular Components , 1995, Developments in Language Theory.

[13]  Alexander S. Spirin,et al.  Eukaryotic Elongation Factor 1A Interacts with the Upstream Pseudoknot Domain in the 3′ Untranslated Region of Tobacco Mosaic Virus RNA , 2002, Journal of Virology.

[14]  Russell L. Malmberg,et al.  Stochastic modeling of RNA pseudoknotted structures: a grammatical approach , 2003, ISMB.

[15]  Gheorghe Paun,et al.  Further remarks on parallel communicating grammar systems , 1990, Int. J. Comput. Math..

[16]  Christian Zwieb,et al.  tmRDB (tmRNA database) , 2003, Nucleic Acids Res..

[17]  Elena Rivas,et al.  The language of RNA: a formal grammar that includes pseudoknots , 2000, Bioinform..

[18]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[19]  Yasuyuki Kurihara,et al.  Imino proton NMR analysis of HDV ribozymes: nested double pseudoknot structure and Mg2+ ion-binding site close to the catalytic core in solution. , 2002, Nucleic acids research.

[20]  Chantal Ehresmann,et al.  In Vitro Evidence for a Long Range Pseudoknot in the 5′-Untranslated and Matrix Coding Regions of HIV-1 Genomic RNA* , 2002, The Journal of Biological Chemistry.

[21]  Tatsuya Akutsu,et al.  Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots , 2000, Discret. Appl. Math..

[22]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[23]  Christian Zwieb,et al.  tmRDB (tmRNA database) , 2000, Nucleic Acids Res..

[24]  Robert Giegerich,et al.  Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics , 2004, BMC Bioinformatics.

[25]  Christian N. S. Pedersen,et al.  RNA Pseudoknot Prediction in Energy-Based Models , 2000, J. Comput. Biol..

[26]  Satoshi Kobayashi,et al.  Tree Adjoining Grammars for RNA Structure Prediction , 1999, Theor. Comput. Sci..