Fast Structural Alignment of RNAs by Optimizing the Adjoining Order of Profile-csHMMs

A novel RNA structural alignment method has been proposed based on profile-csHMMs. In principle, the profile-csHMM based approach can handle any kind of RNA secondary structures including pseudoknots, and it has been shown that the proposed approach can find highly accurate RNA alignments. In order to find the optimal alignment, the method employs the SCA algorithm that can be used for finding the optimal state sequence of profile-csHMMs. The computational complexity of the SCA algorithm is not fixed, and it depends on the so-called adjoining order that describes how we can trace-back the optimal state sequence in a given profile-csHMM. Therefore, for fast RNA structural alignments, it is important to find the adjoining order that has the minimum computational cost. In this paper, we propose an efficient algorithm that can systematically find the optimal adjoining order that minimizes the computational cost for finding the RNA alignments. Numerical experiments show that employing the proposed algorithm can make the alignment speed up to 3.6 times faster, without any degradation in the quality of the RNA alignments.

[1]  J. Steitz,et al.  The expanding universe of noncoding RNAs. , 2006, Cold Spring Harbor symposia on quantitative biology.

[2]  P. P. Vaidyanathan,et al.  Fast Search of Sequences with Complex Symbol Correlations using Profile Context-Sensitive HMMS and Pre-Screening Filters , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3]  Yasubumi Sakakibara,et al.  Pair hidden Markov models on tree structures , 2003, ISMB.

[4]  Hiroshi Matsui,et al.  Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[5]  Byung-Jun Yoon,et al.  Context-Sensitive Hidden Markov Models for Modeling Long-Range Dependencies in Symbol Sequences , 2006, IEEE Transactions on Signal Processing.

[6]  P. P. Vaidyanathan,et al.  Structural Alignment of RNAs Using Profile-csHMMs and Its Application to RNA Homology Search: Overview and New Results , 2008, IEEE Transactions on Automatic Control.

[7]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[8]  J. Mattick Challenging the dogma: the hidden layer of non-protein-coding RNAs in complex organisms. , 2003, BioEssays : news and reviews in molecular, cellular and developmental biology.

[9]  Byung-Jun Yoon,et al.  Computational identification and analysis of noncoding RNAs - Unearthing the buried treasures in the genorne , 2007, IEEE Signal Processing Magazine.

[10]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[11]  P. P. Vaidyanathan,et al.  Profile Context-Sensitive HMMs for Probabilistic Modeling of Sequences With Complex Correlations , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[12]  Elena Rivas,et al.  The language of RNA: a formal grammar that includes pseudoknots , 2000, Bioinform..

[13]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[14]  T. Speed,et al.  Biological Sequence Analysis , 1998 .

[15]  S. Eddy Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.