Predicting RNA secondary structure via adaptive deep recurrent neural networks with energy-based filter

RNA secondary structure prediction is an important issue in structural bioinformatics, and RNA pseudoknotted secondary structure prediction represents an NP-hard problem. Recently, many different machine-learning methods, Markov models, and neural networks have been employed for this problem, with encouraging results regarding their predictive accuracy; however, their performances are usually limited by the requirements of the learning model and over-fitting, which requires use of a fixed number of training features. Because most natural biological sequences have variable lengths, the sequences have to be truncated before the features are employed by the learning model, which not only leads to the loss of information but also destroys biological-sequence integrity. To address this problem, we propose an adaptive sequence length based on deep-learning model and integrate an energy-based filter to remove the over-fitting base pairs. Comparative experiments conducted on an authoritative dataset RNA STRAND (RNA secondary STRucture and statistical Analysis Database) revealed a 12% higher accuracy relative to three currently used methods.

[1]  David Sankoff,et al.  RNA secondary structures and their prediction , 1984 .

[2]  The Ribonuclease P Database. , 1996, Nucleic acids research.

[3]  James W. Brown,et al.  The Ribonuclease P Database , 1994, Nucleic Acids Res..

[4]  Christian Zwieb,et al.  SRPDB (Signal Recognition Particle Database) , 2000, Nucleic Acids Res..

[5]  Christian N. S. Pedersen,et al.  RNA Pseudoknot Prediction in Energy-Based Models , 2000, J. Comput. Biol..

[6]  Michael Zuker,et al.  Mfold web server for nucleic acid folding and hybridization prediction , 2003, Nucleic Acids Res..

[7]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[8]  Christian Zwieb,et al.  SRPDB: Signal Recognition Particle Database , 2003, Nucleic Acids Res..

[9]  Robert Giegerich,et al.  A comprehensive comparison of comparative RNA structure prediction approaches , 2004, BMC Bioinformatics.

[10]  D. Mathews Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. , 2004, RNA.

[11]  Sean R. Eddy,et al.  Rfam: annotating non-coding RNAs in complete genomes , 2004, Nucleic Acids Res..

[12]  David H Mathews,et al.  Prediction of RNA secondary structure by free energy minimization. , 2006, Current opinion in structural biology.

[13]  Kathleen Marchal,et al.  Evaluation of time profile reconstruction from complex two-color microarray designs , 2008, BMC Bioinformatics.

[14]  Anne Condon,et al.  RNA STRAND: The RNA Secondary Structure and Statistical Analysis Database , 2008, BMC Bioinformatics.

[15]  E. Mancini,et al.  RNA packaging motor: From structure to quantum mechanical modelling and sequential-stochastic mechanism , 2008 .

[16]  Jari Björne,et al.  Comparative analysis of five protein-protein interaction corpora , 2008, BMC Bioinformatics.

[17]  Kiyoshi Asai,et al.  Prediction of RNA secondary structure using generalized centroid estimators , 2009, Bioinform..

[18]  David A. Bader,et al.  GTfold: a scalable multicore code for RNA secondary structure prediction , 2009, SAC '09.

[19]  Peter F. Stadler,et al.  tRNAdb 2009: compilation of tRNA sequences and tRNA genes , 2008, Nucleic Acids Res..

[20]  D. Mathews,et al.  ProbKnot: fast prediction of RNA secondary structure including pseudoknots. , 2010, RNA.

[21]  Eckart Bindewald,et al.  CyloFold: secondary structure prediction including pseudoknots , 2010, Nucleic Acids Res..

[22]  Weiren Wu,et al.  Autonomous navigation method with high accuracy for cruise phase of Mars probe , 2012 .

[23]  Zhen Wang,et al.  SFAPS: An R package for structure/function analysis of protein sequences based on informational spectrum method , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[24]  De-Shuang Huang,et al.  Normalized Feature Vectors: A Novel Alignment-Free Sequence Comparison Method Based on the Numbers of Adjacent Amino Acids , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[25]  Zhi-Hua Zhou,et al.  Sequence-Based Prediction of microRNA-Binding Residues in Proteins Using Cost-Sensitive Laplacian Support Vector Machines , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[26]  D. Mathews Using the RNAstructure Software Package to Predict Conserved RNA Structures , 2014, Current protocols in bioinformatics.

[27]  Hong Wu,et al.  Modeling the Structural Topology and Predicting the Three-Dimensional Structure for Transmembrane Helixes of GPCR: Modeling the Structural Topology and Predicting the Three-Dimensional Structure for Transmembrane Helixes of GPCR , 2014 .

[28]  Lei Zhang,et al.  Prediction of protein-protein interactions based on protein-protein correlation using least squares regression. , 2014, Current protein & peptide science.

[29]  Kelly P. Williams,et al.  The tmRNA website , 2014, Nucleic Acids Res..

[30]  Michelle J. Wu,et al.  Principles for Predicting RNA Secondary Structure Design Difficulty. , 2016, Journal of molecular biology.

[31]  Michael T. Wolfinger,et al.  Predicting RNA secondary structures from sequence and probing data. , 2016, Methods.

[32]  Yu Xue,et al.  Deep Conditional Random Field Approach to Transmembrane Topology Prediction and Application to GPCR Three-Dimensional Structure Modeling , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[33]  De-Shuang Huang,et al.  Direct AUC optimization of regulatory motifs , 2017, Bioinform..

[34]  Zhu-Hong You,et al.  Identifying Spurious Interactions in the Protein-Protein Interaction Networks Using Local Similarity Preserving Embedding , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[35]  De-Shuang Huang,et al.  An efficient method to transcription factor binding sites imputation via simultaneous completion of multiple matrices with positional consistency. , 2017, Molecular bioSystems.

[36]  Fariza Tahi,et al.  Bi-objective integer programming for RNA secondary structure prediction with pseudoknots , 2018, BMC Bioinformatics.

[37]  Guohui Chuai,et al.  DeepCRISPR: optimized CRISPR guide RNA design by deep learning , 2018, Genome Biology.

[38]  Cheng Chen,et al.  RNA Secondary Structure Prediction Based on Long Short-Term Memory Model , 2018, ICIC.

[39]  Yu Hua,et al.  A Deep Learning Model for Predicting RNA-Binding Proteins Only from Primary Sequences , 2018 .

[40]  D. Wales,et al.  Energy Landscape and Pathways for Transitions between Watson-Crick and Hoogsteen Base Pairing in DNA. , 2018, The journal of physical chemistry letters.

[41]  D. Chiu,et al.  Large-scale Investigation of Long Noncoding RNA Secondary Structures in Human and Mouse , 2018, Current Bioinformatics.

[42]  De-Shuang Huang,et al.  iEnhancer‐EL: identifying enhancers and their strength with ensemble learning approach , 2018, Bioinform..

[43]  Unified Deep Learning Architecture for Modeling Biology Sequence. , 2018, IEEE/ACM transactions on computational biology and bioinformatics.

[44]  Hongjie Wu,et al.  Unified Deep Learning Architecture for Modeling Biology Sequence , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[45]  De-Shuang Huang,et al.  Recurrent Neural Network for Predicting Transcription Factor Binding Sites , 2018, Scientific Reports.

[46]  De-Shuang Huang,et al.  iRO-3wPseKNC: identify DNA replication origins by three-window-based PseKNC , 2018, Bioinform..