A New Method of RNA Secondary Structure Prediction Based on Convolutional Neural Network and Dynamic Programming

In recent years, obtaining RNA secondary structure information has played an important role in RNA and gene function research. Although some RNA secondary structures can be gained experimentally, in most cases, efficient, and accurate computational methods are still needed to predict RNA secondary structure. Current RNA secondary structure prediction methods are mainly based on the minimum free energy algorithm, which finds the optimal folding state of RNA in vivo using an iterative method to meet the minimum energy or other constraints. However, due to the complexity of biotic environment, a true RNA structure always keeps the balance of biological potential energy status, rather than the optimal folding status that meets the minimum energy. For short sequence RNA its equilibrium energy status for the RNA folding organism is close to the minimum free energy status; therefore, the minimum free energy algorithm for predicting RNA secondary structure has higher accuracy. Nevertheless, in a longer sequence RNA, constant folding causes its biopotential energy balance to deviate far from the minimum free energy status. This deviation is because of its complex structure and results in a serious decline in the prediction accuracy of its secondary structure. In this paper, we propose a novel RNA secondary structure prediction algorithm using a convolutional neural network model combined with a dynamic programming method to improve the accuracy with large-scale RNA sequence and structure data. We analyze current experimental RNA sequences and structure data to construct a deep convolutional network model, and then we extract implicit features of an effective classification from large-scale data to predict the pairing probability of each base in an RNA sequence. For the obtained probabilities of RNA sequence base pairing, an enhanced dynamic programming method is applied to obtain the optimal RNA secondary structure. Results indicate that our proposed method is superior to the common RNA secondary structure prediction algorithms in predicting three benchmark RNA families. Based on the characteristics of deep learning algorithm, it can be inferred that the method proposed in this paper has a 30% higher prediction success rate when compared with other algorithms, which will be needed as the amount of real RNA structure data increases in the future.

[1]  Morgan C. Giddings,et al.  High-Throughput SHAPE Analysis Reveals Structures in HIV-1 Genomic RNA Strongly Conserved across Distinct Biological States , 2008, PLoS biology.

[2]  H. Schwalbe,et al.  NMR Spectroscopy of RNA , 2003, Chembiochem : a European journal of chemical biology.

[3]  Bjarne Knudsen,et al.  RNA secondary structure prediction using stochastic context-free grammars and evolutionary history , 1999, Bioinform..

[4]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[5]  Song Dandan Neural network approach to predict RNA secondary structures , 2006 .

[6]  Rex A. Dwyer,et al.  RNA Secondary Structure , 2002 .

[7]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[8]  David H. Mathews,et al.  NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure , 2009, Nucleic Acids Res..

[9]  Howard Y. Chang,et al.  Genome-wide measurement of RNA secondary structure in yeast , 2010, Nature.

[10]  Jerrold R. Griggs,et al.  Algorithms for Loop Matchings , 1978 .

[11]  Y. Zhang,et al.  In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features , 2013, Nature.

[12]  Yuh-Jyh Hu,et al.  GPRM: a genetic programming approach to finding common RNA secondary structure elements , 2003, Nucleic Acids Res..

[13]  Karissa Y. Sanbonmatsu,et al.  Sizing up long non-coding RNAs , 2012, Bioarchitecture.

[14]  Karissa Y Sanbonmatsu,et al.  3S: shotgun secondary structure determination of long non-coding RNAs. , 2013, Methods.

[15]  Jian Peng,et al.  Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields , 2015, Scientific Reports.

[16]  Karissa Y. Sanbonmatsu,et al.  3 S : Shotgun secondary structure determination of long non-coding RNAs , 2014 .

[17]  Michael Zuker,et al.  Mfold web server for nucleic acid folding and hybridization prediction , 2003, Nucleic Acids Res..

[18]  D. Sankoff Simultaneous Solution of the RNA Folding, Alignment and Protosequence Problems , 1985 .

[19]  Jaap Heringa,et al.  Protein secondary structure prediction. , 2010, Methods in molecular biology.

[20]  Yavuz Canbay,et al.  A Review on RNA Secondary Structure Prediction Algorithms , 2018, 2018 International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT).

[21]  Julien Allali,et al.  A new distance for high level RNA secondary structure comparison , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.