RPI-SE: a stacking ensemble learning framework for ncRNA-protein interactions prediction using sequence information

Background The interactions between non-coding RNAs (ncRNA) and proteins play an essential role in many biological processes. Several high-throughput experimental methods have been applied to detect ncRNA-protein interactions. However, these methods are time-consuming and expensive. Accurate and efficient computational methods can assist and accelerate the study of ncRNA-protein interactions. Results In this work, we develop a stacking ensemble computational framework, RPI-SE, for effectively predicting ncRNA-protein interactions. More specifically, to fully exploit protein and RNA sequence feature, Position Weight Matrix combined with Legendre Moments is applied to obtain protein evolutionary information. Meanwhile, k -mer sparse matrix is employed to extract efficient feature of ncRNA sequences. Finally, an ensemble learning framework integrated different types of base classifier is developed to predict ncRNA-protein interactions using these discriminative features. The accuracy and robustness of RPI-SE was evaluated on three benchmark data sets under five-fold cross-validation and compared with other state-of-the-art methods. Conclusions The results demonstrate that RPI-SE is competent for ncRNA-protein interactions prediction task with high accuracy and robustness. It’s anticipated that this work can provide a computational prediction tool to advance ncRNA-protein interactions related biomedical research.

[1]  Shandar Ahmad,et al.  PSSM-based prediction of DNA binding sites in proteins , 2005, BMC Bioinformatics.

[2]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[3]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[4]  MengChu Zhou,et al.  Highly Efficient Framework for Predicting Interactions Between Proteins , 2017, IEEE Transactions on Cybernetics.

[5]  Yue Zhang,et al.  Affine Legendre Moment Invariants for Image Watermarking Robust to Geometric Distortions , 2011, IEEE Transactions on Image Processing.

[6]  Enrico Blanzieri,et al.  Protein-specific prediction of mRNA binding using RNA sequences, binding motifs and predicted secondary structures , 2014, BMC Bioinformatics.

[7]  Hai-Cheng Yi,et al.  ACP-DL: A Deep Learning Long Short-Term Memory Model to Predict Anticancer Peptides Using High-Efficiency Feature Representation , 2019, Molecular therapy. Nucleic acids.

[8]  Hong-Bin Shen,et al.  IPMiner: hidden ncRNA-protein interaction sequential pattern mining with stacked autoencoder for accurate computational prediction , 2016, BMC Genomics.

[9]  Anton J. Enright,et al.  An ENU-induced mutation of miR-96 associated with progressive hearing loss in mice , 2009, Nature Genetics.

[10]  J. Mattick,et al.  The relationship between non-protein-coding DNA and eukaryotic complexity. , 2007, BioEssays : news and reviews in molecular, cellular and developmental biology.

[11]  Howard Y. Chang,et al.  Unique features of long non-coding RNA biogenesis and function , 2015, Nature Reviews Genetics.

[12]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[13]  V. Suresh,et al.  RPI-Pred: predicting ncRNA-protein interaction using sequence and structural information , 2015, Nucleic acids research.

[14]  J. Bujnicki,et al.  Computational methods for prediction of protein-RNA interactions. , 2012, Journal of structural biology.

[15]  Zhu-Hong You,et al.  Combining High Speed ELM Learning with a Deep Convolutional Neural Network Feature Encoding for Predicting Protein-RNA Interactions , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[17]  D. Penny,et al.  The Path from the RNA World , 1998, Journal of Molecular Evolution.

[18]  Federico Agostini,et al.  Predicting protein associations with long noncoding RNAs , 2011, Nature Methods.

[19]  Hai-Cheng Yi,et al.  A Deep Learning Framework for Robust and Accurate Prediction of ncRNA-Protein Interactions Using Evolutionary Information , 2018, Molecular therapy. Nucleic acids.

[20]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[21]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[22]  E. Larsson,et al.  The Non-Coding Oncogene: A Case of Missing DNA Evidence? , 2012, Front. Gene..

[23]  A. Chapelle,et al.  Mutations in the RNA Component of RNase MRP Cause a Pleiotropic Human Disease, Cartilage-Hair Hypoplasia , 2001, Cell.

[24]  Edwin H. Cook,et al.  Copy-number variations associated with neuropsychiatric conditions , 2008, Nature.

[25]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[26]  Hai-Cheng Yi,et al.  Learning distributed representations of RNA and protein sequences and its application for predicting lncRNA-protein interactions , 2019, Computational and structural biotechnology journal.

[27]  Petr Klus,et al.  catRAPID omics: a web server for large-scale prediction of protein–RNA interactions , 2013, Bioinform..

[28]  Vasant Honavar,et al.  Predicting RNA-Protein Interactions Using Only Sequence Information , 2011, BMC Bioinformatics.

[29]  T. Morgan,et al.  Expression of a noncoding RNA is elevated in Alzheimer's disease and drives rapid feed-forward regulation of β-secretase , 2008, Nature Medicine.

[30]  Vasant Honavar,et al.  PRIDB: a protein–RNA interface database , 2010, Nucleic Acids Res..

[31]  Xuegong Zhang,et al.  Computational prediction of associations between long non-coding RNAs and proteins , 2013, BMC Genomics.

[32]  Hui Zhou,et al.  starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein–RNA interaction networks from large-scale CLIP-Seq data , 2013, Nucleic Acids Res..

[33]  Xue-wen Chen,et al.  On Position-Specific Scoring Matrix for Protein Function Prediction , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[34]  Marwan Shinawi,et al.  Prader-Willi phenotype caused by paternal deficiency for the HBII-85 C/D box small nucleolar RNA cluster , 2008, Nature Genetics.

[35]  J. Bähler,et al.  In silico characterization and prediction of global protein–mRNA interactions in yeast , 2011, Nucleic acids research.

[36]  Yong Zhou,et al.  Computational Methods for the Prediction of Drug-Target Interactions from Drug Fingerprints and Protein Sequences by Stacked Auto-Encoder Deep Neural Network , 2017, ISBRA.

[37]  P. Avner,et al.  Quantitative predictions of protein interactions with long noncoding RNAs , 2016, Nature Methods.

[38]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[39]  F. Slack,et al.  The Role of Non-coding RNAs in Oncology , 2019, Cell.

[40]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[41]  M. Esteller Non-coding RNAs in human disease , 2011, Nature Reviews Genetics.

[42]  Hai-Cheng Yi,et al.  Construction and Analysis of Molecular Association Network by Combining Behavior Representation and Node Attributes , 2019, Front. Genet..

[43]  HuangYing,et al.  CD-HIT Suite , 2010 .