Research on RNA secondary structure predicting via bidirectional recurrent neural network

Background RNA secondary structure prediction is an important research content in the field of biological information. Predicting RNA secondary structure with pseudoknots has been proved to be an NP-hard problem. Traditional machine learning methods can not effectively apply protein sequence information with different sequence lengths to the prediction process due to the constraint of the self model when predicting the RNA secondary structure. In addition, there is a large difference between the number of paired bases and the number of unpaired bases in the RNA sequences, which means the problem of positive and negative sample imbalance is easy to make the model fall into a local optimum. To solve the above problems, this paper proposes a variable-length dynamic bidirectional Gated Recurrent Unit(VLDB GRU) model. The model can accept sequences with different lengths through the introduction of flag vector. The model can also make full use of the base information before and after the predicted base and can avoid losing part of the information due to truncation. Introducing a weight vector to predict the RNA training set by dynamically adjusting each base loss function solves the problem of balanced sample imbalance. Results The algorithm proposed in this paper is compared with the existing algorithms on five representative subsets of the data set RNA STRAND. The experimental results show that the accuracy and Matthews correlation coefficient of the method are improved by 4.7% and 11.4%, respectively. Conclusions The flag vector introduced allows the model to effectively use the information before and after the protein sequence; the introduced weight vector solves the problem of unbalanced sample balance. Compared with other algorithms, the LVDB GRU algorithm proposed in this paper has the best detection results.

[1]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[2]  Le Yang,et al.  Prediction of the RNA Secondary Structure Using a Multi-Population Assisted Quantum Genetic Algorithm , 2019, Human Heredity.

[3]  Shizuo Akira,et al.  The RNA helicase RIG-I has an essential function in double-stranded RNA-induced innate antiviral responses , 2004, Nature Immunology.

[4]  David H. Mathews,et al.  RNAstructure: software for RNA secondary structure prediction and analysis , 2010, BMC Bioinformatics.

[5]  Quan Zou,et al.  Identification of DEP domain-containing proteins by a machine learning method and experimental analysis of their expression in human HCC tissues , 2016, Scientific Reports.

[6]  D. Mathews,et al.  ProbKnot: fast prediction of RNA secondary structure including pseudoknots. , 2010, RNA.

[7]  Jing Qiu,et al.  Predicting RNA secondary structure via adaptive deep recurrent neural networks with energy-based filter , 2019, BMC Bioinformatics.

[8]  Yan He,et al.  Classification of Small GTPases with Hybrid Protein Features and Advanced Machine Learning Techniques , 2017, Current Bioinformatics.

[9]  Jian Song,et al.  Identification of DNA–protein Binding Sites through Multi-Scale Local Average Blocks on Sequence Information , 2017, Molecules.

[10]  D. Mathews Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization. , 2004, RNA.

[11]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[12]  Jing Qiu,et al.  Ranking near-native candidate protein structures via random forest classification , 2019, BMC Bioinformatics.

[13]  Chuang Wu,et al.  Identify High-Quality Protein Structural Models by Enhanced K-Means , 2017, BioMed research international.

[14]  Hao Wang,et al.  Identification of membrane protein types via multivariate information fusion with Hilbert-Schmidt Independence Criterion , 2020, Neurocomputing.

[15]  Jijun Tang,et al.  Identification of drug-target interactions via multiple information integration , 2017, Inf. Sci..

[16]  Hosna Jabbari,et al.  A fast and robust iterative algorithm for prediction of RNA pseudoknotted secondary structures , 2014, BMC Bioinformatics.

[17]  Q. Zou,et al.  Construction and Identification of the RNAi Recombinant Lentiviral Vector Targeting Human DEPDC7 Gene , 2016, Interdisciplinary Sciences: Computational Life Sciences.

[18]  Yong Huang,et al.  In Silico Prediction of Gamma-Aminobutyric Acid Type-A Receptors Using Novel Machine-Learning-Based SVM and GBDT Approaches , 2016, BioMed research international.

[19]  Q. Zou,et al.  Prediction and Identification of Krüppel-Like Transcription Factors by Machine Learning Method. , 2017, Combinatorial chemistry & high throughput screening.

[20]  Xiaoqin Yuan,et al.  RNA Sequencing Analysis of Molecular Basis of Sodium Butyrate-Induced Growth Inhibition on Colorectal Cancer Cell Lines , 2019, BioMed research international.

[21]  Hong Liang,et al.  RGRNA: prediction of RNA secondary structure based on replacement and growth of stems , 2017, Computer methods in biomechanics and biomedical engineering.

[22]  Jijun Tang,et al.  Identification of protein subcellular localization via integrating evolutionary and physicochemical information into Chou's general PseAAC. , 2019, Journal of theoretical biology.

[23]  Aïda Ouangraoua,et al.  aliFreeFold: an alignment-free approach to predict secondary structure from homologous RNA sequences , 2018, Bioinform..

[24]  Rafiqul Islam,et al.  Chemical reaction optimization for RNA structure prediction , 2018, Applied Intelligence.

[25]  Z. Liao,et al.  DEPDC7 inhibits cell proliferation, migration and invasion in hepatoma cells , 2017, Oncology letters.

[26]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.

[27]  Yi Xiao,et al.  Evaluation of RNA secondary structure prediction for both base-pairing and topology , 2018, Biophysics Reports.

[28]  Fariza Tahi,et al.  Bi-objective integer programming for RNA secondary structure prediction with pseudoknots , 2018, BMC Bioinformatics.

[29]  Jianping Chen,et al.  Research on predicting 2D-HP protein folding using reinforcement learning with full state space , 2019, BMC Bioinformatics.

[30]  Shuxia Liu,et al.  Complement factor B knockdown by short hairpin RNA inhibits laser-induced choroidal neovascularization in rats. , 2020, International journal of ophthalmology.

[31]  Kevin Y. Yip,et al.  Improved prediction of RNA secondary structure by integrating the free energy model with restraints derived from experimental probing data , 2015, Nucleic acids research.

[32]  H. Hoos,et al.  HotKnots: heuristic prediction of RNA secondary structures including pseudoknots. , 2005, RNA.

[33]  Yasubumi Sakakibara,et al.  A max-margin training of RNA secondary structure prediction integrated with the thermodynamic model , 2017, bioRxiv.

[34]  Enis Günay,et al.  Switched State Controlled-CNN: An Alternative Approach in Generating Complex Systems with Multivariable Nonlinearities Using CNN , 2018, Int. J. Bifurc. Chaos.

[35]  Quan Zou,et al.  Which statistical significance test best detects oncomiRNAs in cancer tissues? An exploratory analysis , 2016, Oncotarget.

[36]  Jan Gorodkin,et al.  Multiple Sequence Alignments Enhance Boundary Definition of RNA Structures , 2018, Genes.

[37]  James E. DiCarlo,et al.  RNA-Guided Human Genome Engineering via Cas9 , 2013, Science.