iRBP-Motif-PSSM: Identification of RNA-Binding Proteins Based on Collaborative Learning

RNAs and RNA-binding proteins (RBPs) in cells can bind with each other to form a nuclear ribonucleoprotein (RNP) complex, playing important roles in life processes, and gene regulation. How to accurately predict the RNA-binding proteins is a big challenge and hot research task. Here, we proposed a new computational predictor called iRBP-Motif-PSSM for identifying RNA-binding proteins by combining the motif information and the evolutionary information extracted from the Position Specific Scoring Matrixes. Collaborative Learning was employed to address the instability problem of the predictor. The experimental results showed that iRBP-Motif-PSSM showed better performance than other existing methods for identifying NA-binding proteins, indicating that iRBP-Motif-PSSM is a useful tool for biological analysis.

[1]  Yoshikazu Nakamura,et al.  The Runt domain of AML1 (RUNX1) binds a sequence-conserved RNA motif that mimics a DNA element , 2013, RNA.

[2]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[3]  Jiangning Song,et al.  Quokka: a comprehensive tool for rapid and accurate prediction of kinase family‐specific phosphorylation sites in the human proteome , 2018, Bioinform..

[4]  Gholamreza Haffari,et al.  Twenty years of bioinformatics research for protease-specific substrate and cleavage site prediction: a comprehensive revisit and benchmarking of existing methods , 2018, Briefings Bioinform..

[5]  Fei Guo,et al.  Improved prediction of protein-protein interactions using novel negative samples, features, and an ensemble classifier , 2017, Artif. Intell. Medicine.

[6]  Geoffrey I. Webb,et al.  iLearn : an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data , 2019, Briefings Bioinform..

[7]  Qing Liao,et al.  Local coordinate based graph-regularized NMF for image representation , 2016, Signal Process..

[8]  Geoffrey I. Webb,et al.  iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites , 2018, Briefings Bioinform..

[9]  Jiu-Xin Tan,et al.  Identification of hormone binding proteins based on machine learning methods. , 2019, Mathematical biosciences and engineering : MBE.

[10]  Q. Zou,et al.  Cancer Diagnosis Through IsomiR Expression with Machine Learning Method , 2016 .

[11]  C. Dominguez,et al.  The RNA recognition motif, a plastic RNA‐binding platform to regulate post‐transcriptional gene expression , 2005, The FEBS journal.

[12]  Ren Long,et al.  iRSpot-EL: identify recombination spots with an ensemble learning approach , 2017, Bioinform..

[13]  Bin Liu,et al.  BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches , 2019, Briefings Bioinform..

[14]  J. Deragon,et al.  A comprehensive analysis of the La-motif protein superfamily. , 2009, RNA.

[15]  De-Shuang Huang,et al.  iRO-3wPseKNC: identify DNA replication origins by three-window-based PseKNC , 2018, Bioinform..

[16]  Gajendra P S Raghava,et al.  SVM based prediction of RNA‐binding proteins using binding residues and evolutionary information , 2011, Journal of molecular recognition : JMR.

[17]  Geoffrey I. Webb,et al.  iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences , 2018, Bioinform..

[18]  Eric A. Ortlund,et al.  The structure, function and evolution of proteins that bind DNA and RNA , 2014, Nature Reviews Molecular Cell Biology.

[19]  Jiangning Song,et al.  MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters , 2019, Bioinform..

[20]  Bin Liu,et al.  MotifCNN-fold: protein fold recognition based on fold-specific features extracted by motif-based convolutional neural networks , 2019, Briefings Bioinform..

[21]  Yu-Dong Cai,et al.  Analysis and Prediction of Nitrated Tyrosine Sites with the mRMR Method and Support Vector Machine Algorithm , 2016 .

[22]  Junjie Chen,et al.  iMiRNA-SSF: Improving the Identification of MicroRNA Precursors by Combining Negative Sets with Different Distributions , 2016, Scientific Reports.

[23]  Wei Chen,et al.  iProEP: A Computational Predictor for Predicting Promoter , 2019, Molecular therapy. Nucleic acids.

[24]  Hao Lv,et al.  Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique , 2018, Bioinform..

[25]  T. Glisovic,et al.  RNA‐binding proteins and post‐transcriptional gene regulation , 2008, FEBS letters.

[26]  Yaoqi Zhou,et al.  SPOT-Seq-RNA: predicting protein-RNA complex structure and RNA-binding function by fold recognition and binding affinity prediction. , 2014, Methods in molecular biology.

[27]  Kuo-Chen Chou,et al.  2L-piRNA: A Two-Layer Ensemble Classifier for Identifying Piwi-Interacting RNAs and Their Function , 2017, Molecular therapy. Nucleic acids.

[28]  Ling Jing,et al.  Predicting DNA- and RNA-binding proteins from sequences with kernel methods. , 2009, Journal of theoretical biology.

[29]  Xiaolong Wang,et al.  A comprehensive review and comparison of existing computational methods for intrinsically disordered protein and region prediction , 2019, Briefings Bioinform..

[30]  Geoffrey I. Webb,et al.  Positive-unlabelled learning of glycosylation sites in the human proteome , 2019, BMC Bioinformatics.

[31]  Ying Ju,et al.  Improving tRNAscan‐SE Annotation Results via Ensemble Classifiers , 2015, Molecular informatics.

[32]  Junjie Chen,et al.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences , 2015, Nucleic Acids Res..

[33]  Quan Zou,et al.  Incorporating Distance-based Top-n-gram and Random Forest to Identify Electron Transport Proteins. , 2019, Journal of proteome research.

[34]  Wei Chen,et al.  Predicting protein structural classes for low-similarity sequences by evaluating different features , 2019, Knowl. Based Syst..

[35]  B. Liu,et al.  An Approach for Identifying Cytokines Based on a Novel Ensemble Classifier , 2013, BioMed research international.

[36]  S. Gerstberger,et al.  A census of human RNA-binding proteins , 2014, Nature Reviews Genetics.

[37]  Xiaolong Wang,et al.  A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis , 2008, BMC Bioinformatics.

[38]  Bin Liu,et al.  ProtDec-LTR3.0: Protein Remote Homology Detection by Incorporating Profile-Based Features Into Learning to Rank , 2019, IEEE Access.

[39]  X. Chen,et al.  SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence , 2003, Nucleic Acids Res..

[40]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[41]  Ponnuthurai N. Suganthan,et al.  MegaMotifBase: a database of structural motifs in protein families and superfamilies , 2008, Nucleic Acids Res..

[42]  Quan Zou,et al.  Selecting Essential MicroRNAs Using a Novel Voting Method , 2019, Molecular therapy. Nucleic acids.

[43]  Q. Zou,et al.  Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA , 2018, RNA.

[44]  Xiaoli Zhang,et al.  RBPPred: predicting RNA‐binding proteins from sequence using SVM , 2016, Bioinform..

[45]  Ke Chen,et al.  Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs , 2007, BMC Structural Biology.

[46]  Junjie Chen,et al.  A comprehensive review and comparison of different computational methods for protein remote homology detection , 2018, Briefings Bioinform..

[47]  Jiu-Xin Tan,et al.  Evaluation of different computational methods on 5-methylcytosine sites identification , 2020, Briefings Bioinform..

[48]  Lukasz Kurgan,et al.  Prediction of protein crystallization using collocation of amino acid pairs. , 2007, Biochemical and biophysical research communications.

[49]  Lukasz A. Kurgan,et al.  Prediction of protein structural class using novel evolutionary collocation‐based sequence representation , 2008, J. Comput. Chem..

[50]  Leyi Wei,et al.  A novel hierarchical selective ensemble classifier with bioinformatics application , 2017, Artif. Intell. Medicine.

[51]  Hua Tang,et al.  Identification of Bacterial Cell Wall Lyases via Pseudo Amino Acid Composition , 2016, BioMed research international.

[52]  Bin Liu,et al.  DeepSVM-fold: protein fold recognition by combining support vector machines and pairwise sequence similarity scores generated by deep learning networks , 2019, Briefings Bioinform..

[53]  Fan Yang,et al.  iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC , 2018, Bioinform..

[54]  Han Zhang,et al.  BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches , 2019, Nucleic acids research.

[55]  Leszek Rychlewski,et al.  ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins , 2003, Nucleic Acids Res..

[56]  Chris Sander,et al.  Removing near-neighbour redundancy from large protein sequence collections , 1998, Bioinform..

[57]  Toby J. Gibson,et al.  The eukaryotic linear motif resource – 2018 update , 2017, Nucleic Acids Res..

[58]  Ren Long,et al.  iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework , 2016, Bioinform..

[59]  Shuigeng Zhou,et al.  A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation , 2009, Bioinform..

[60]  Lukasz A. Kurgan,et al.  Prediction of integral membrane protein type by collocated hydrophobic amino acid pairs , 2009, J. Comput. Chem..

[61]  B. Liu,et al.  PSFM-DBT: Identifying DNA-Binding Proteins by Combing Position Specific Frequency Matrix and Distance-Bigram Transformation , 2017, International journal of molecular sciences.

[62]  Yu Xue,et al.  DeepNitro: Prediction of Protein Nitration and Nitrosylation Sites by Deep Learning , 2018, Genom. Proteom. Bioinform..