Alignment of Noncoding Ribonucleic Acids with Pseudoknots Using Context-Sensitive Hidden Markov Model

Up to now, various signal processing techniques have been used to predict protein-coding genes that are unsuitable for predicting ribonucleic acids (RNAs). Modeling a gene network can be employed in various fields, such as the discovery of new drugs, reducing the side effects of treatment methods, further identifying genetic diseases and treatments for genetic disorders by influencing the activity of effectual genes, preventing the growth of unwanted tissues via growth weakening and cell reproduction, and also for many other applications in the fields of medicine and agriculture. The main purpose of this study was to design a suitable algorithm based on context-sensitive hidden Markov models (csHMMs) for the alignment of secondary structures of RNAs, which can identify noncoding RNAs. In this model, several RNA families are compared, and their existing similarities are measured. An expectation–maximization algorithm is used to estimate the model's parameters. This algorithm is the standard algorithm to maximize HMM parameters. The alignment results for RNAs belonging to the hepatitis delta virus family showed an accuracy of 83.33%, a specificity of 89%, and a sensitivity of 97%, and RNAs belonging to the purine family showed an accuracy of 65%, a specificity of 76%, and a sensitivity of 76%. The results show that csHMMs, in addition to aligning the primary sequences of RNAs, would align the secondary structures of RNAs with high accuracy.

[1]  Weixiong Zhang,et al.  An Iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots , 2004, Bioinform..

[2]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[3]  Xiaoning Qian,et al.  Comparative Analysis of Biological Networks: Hidden Markov model and Markov chain-based approach , 2012, IEEE Signal Processing Magazine.

[4]  J. Miranda-Ríos,et al.  A conserved RNA structure (thi box) is involved in regulation of thiamin biosynthetic gene expression in bacteria , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Sankar K. Pal,et al.  RNA Secondary Structure Prediction Using Soft Computing , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[7]  Sean R. Eddy,et al.  Biological sequence analysis: Preface , 1998 .

[8]  V. Moulton Tracking down noncoding RNAs. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Hiroshi Matsui,et al.  Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[10]  P. P. Vaidyanathan,et al.  Structural Alignment of RNAs Using Profile-csHMMs and Its Application to RNA Homology Search: Overview and New Results , 2008, IEEE Transactions on Automatic Control.

[11]  Peter F Stadler,et al.  A Support Vector Machine based method to distinguish long non-coding RNAs from protein coding transcripts , 2017, BMC Genomics.

[12]  Antonino Fiannaca,et al.  nRC: non-coding RNA Classifier based on structural features , 2017, BioData Mining.