Graph-Based Analysis of RNA Secondary Structure Similarity Comparison

In organisms, ribonucleic acid (RNA) plays an essential role. Its function is being discovered more and more. Due to the conserved nature of RNA sequences, its function mainly depends on the RNA secondary structure. The discovery of an approximate relationship between two RNA secondary structures helps to understand their functional relationship better. It is an important and urgent task to explore structural similarities from the graphical representation of RNA secondary structures. In this paper, a novel graphical analysis method based on the triple vector curve representation of RNA secondary structures is proposed. A combinational method involving a discrete wavelet transform (DWT) and fractal dimension with sliding window is introduced to analyze and compare the graphs derived from feature extraction; after that, the distance matrix is generated. Then, the distance matrix is analyzed by clustering and visualized as a clustering tree. RNA virus and noncoding RNA datasets are applied to perform experiments and analyze the clustering tree. The results show that the proposed method yields more accurate results in the comparison of RNA secondary structures.

[1]  Xingming Sun,et al.  A binary coding method of RNA secondary structure and its application , 2009, J. Comput. Chem..

[2]  Ian Holmes,et al.  Stem Stem Stem Stem Loop Loop Loop LoopLoop Loop Loop Loop Loop Loop Loop , 2005 .

[3]  Milan Randić,et al.  On characterization of DNA primary sequences by a condensed matrix , 2000 .

[4]  Cédric Chauve,et al.  An Edit Distance Between RNA Stem-Loops , 2005, SPIRE.

[5]  Rolf Backofen,et al.  Fast and accurate structure probability estimation for simultaneous alignment and folding of RNAs with Markov chains , 2020, Algorithms for Molecular Biology.

[6]  Zhihua Zhang,et al.  Multivariate Time Series Analysis in Climate and Environmental Research , 2017 .

[7]  Zhihua Cai,et al.  Boosting for Multi-Graph Classification , 2015, IEEE Transactions on Cybernetics.

[8]  H. Stanley,et al.  Quantification of scaling exponents and crossover phenomena in nonstationary heartbeat time series. , 1995, Chaos.

[9]  P. Basu,et al.  Analysis of Purines and Pyrimidines distribution over miRNAs of Human, Gorilla, Chimpanzee, Mouse and Rat , 2018, Scientific Reports.

[10]  J. Kwapień,et al.  Wavelet versus detrended fluctuation analysis of multifractal structures. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  George J. Vachtsevanos,et al.  A comparison of fractal dimension algorithms using synthetic and experimental data , 1999, ISCAS'99. Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat. No.99CH36349).

[12]  Tatsuya Akutsu,et al.  Comparison of Pseudoknotted RNA Secondary Structures by Topological Centroid Identification and Tree Edit Distance , 2020, J. Comput. Biol..

[13]  Rolf Backofen,et al.  SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics , 2015, Bioinform..

[14]  Ruqiang Yan,et al.  Highly Accurate Machine Fault Diagnosis Using Deep Transfer Learning , 2019, IEEE Transactions on Industrial Informatics.

[15]  Jianbo Gao,et al.  Detection of low observable targets within sea clutter by structure function based multifractal analysis , 2006 .

[16]  Arthur Petrosian,et al.  Kolmogorov complexity of finite sequences and recognition of different preictal EEG patterns , 1995, Proceedings Eighth IEEE Symposium on Computer-Based Medical Systems.

[17]  Sebastian E. Ahnert,et al.  Neutral components show a hierarchical community structure in the genotype–phenotype map of RNA secondary structure , 2020, Journal of the Royal Society Interface.

[18]  Chuan-Sheng Foo,et al.  A max-margin model for efficient simultaneous alignment and folding of RNA sequences , 2008, ISMB.

[19]  Yuan Yan Tang,et al.  A Fractal Dimension and Empirical Mode Decomposition-Based Method for Protein Sequence Analysis , 2019, Int. J. Pattern Recognit. Artif. Intell..

[20]  Yuedong Yang,et al.  Accurate Prediction of Genome-wide RNA Secondary Structure Profile Based On Extreme Gradient Boosting , 2019, bioRxiv.

[21]  Yuan Yan Tang,et al.  A Fractal Dimension and Wavelet Transform Based Method for Protein Sequence Similarity Analysis , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  Gary D. Stormo,et al.  Pairwise local structural alignment of RNA sequences with sequence similarity less than 40% , 2005, Bioinform..

[23]  Philip S. Yu,et al.  Algorithms for Estimating the Partition Function of Restricted Boltzmann Machines (Extended Abstract) , 2020 .

[24]  P. Stadler,et al.  The RNA workbench 2.0: next generation RNA data analysis , 2019, Nucleic Acids Res..

[25]  Emanuela Merelli,et al.  ASPRAlign: a tool for the alignment of RNA secondary structures with arbitrary pseudoknots , 2020, Bioinform..

[26]  Robert Giegerich,et al.  Local similarity in RNA secondary structures , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[27]  Dejan Plavšić,et al.  Novel spectral representation of RNA secondary structure without loss of information , 2009 .