A Novel Comparative Sequence Analysis Method for ncRNA Secondary Structure Prediction without Multiple Sequence Alignment

Consensus RNA secondary structure and helices are always required from several homologous sequences as the development of non-coding RNA. The problem can be solved by comparative sequence analysis methods, in which nearly all the algorithms are based on multiple sequence alignment. However, there are no effective models combining similarity and secondary structure. So far multiple sequence alignment is an open problem, especially applied in predicting RNA secondary structure. In this paper, we propose a novel comparative sequence method without multiple sequence alignment. It compares the position of the helices in dot plots matrix instead of sequences. The concept 'centorid of helix' is presented and a novel algorithm is designed for finding consensus helices. Experiments on tRNA prove that our algorithm outperforms the current main software, including Pfold, MARNA, CARNAC and RNAalifold.

[1]  A. Wilm,et al.  A benchmark of multiple sequence alignment programs upon structural RNAs , 2005, Nucleic acids research.

[2]  R. Gentleman Current Topics in Computational Molecular Biology , 2004 .

[3]  G. Stormo,et al.  Discovering common stem-loop motifs in unaligned RNA sequences. , 2001, Nucleic acids research.

[4]  D. Mathews Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. , 2004, RNA.

[5]  Peng Jiang,et al.  MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features , 2007, Nucleic Acids Res..

[6]  Robert Giegerich,et al.  A comprehensive comparison of comparative RNA structure prediction approaches , 2004, BMC Bioinformatics.

[7]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[8]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[9]  J. Sabina,et al.  Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. , 1999, Journal of molecular biology.

[10]  Kaizhong Zhang,et al.  RNA Secondary Structure Prediction Via Energy Density Minimization , 2006, RECOMB.

[11]  Sean R. Eddy,et al.  Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction , 2004, BMC Bioinformatics.

[12]  Peter F. Stadler,et al.  Prediction of consensus RNA secondary structures including pseudoknots , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  I. Tinoco,et al.  Estimation of Secondary Structure in Ribonucleic Acids , 1971, Nature.

[14]  Peter F Stadler,et al.  Fast and reliable prediction of noncoding RNAs , 2005, Proc. Natl. Acad. Sci. USA.