Efficient Framework for Predicting ncRNA-Protein Interactions Based on Sequence Information by Deep Learning

The interactions between proteins and RNA (RPIs) play a crucial role in most cellular processes such as RNA stability and translation. Although there have been many high-throughput experiments recently to detect RPIs, these experiments are largely time-consuming and labor-intensive. Therefore, it is imminent to propose an efficient computational method to predict RPIs. In this study, we put forward a novel approach for predicting protein and ncRNA interactions based on sequences information only. By employing the bi-gram probability feature extraction method and k-mer algorithm, the represent features from protein and ncRNA were extracted. To evaluate the performance of the proposed model, two widely used datasets named RPI1807 and RPI2241 were trained with the adoption of random forest classifier by using five-fold cross-validation. The experimental results with the AUC of 0.992 and 0.947 on dataset RPI1807 and RPI2241 respectively indicated the effectiveness of our experimental approach for predicting RPIs, which provided the guidance for reference for future research in the biological field.

[1]  Zhen Ji,et al.  Assessing and predicting protein interactions by combining manifold embedding with multiple information integration , 2012, BMC Bioinformatics.

[2]  Xing Chen,et al.  Long non-coding RNAs and complex diseases: from experimental results to computational models , 2016, Briefings Bioinform..

[3]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[4]  Zhu-Hong You,et al.  Detection of Interactions between Proteins through Rotation Forest and Local Phase Quantization Descriptors , 2015, International journal of molecular sciences.

[5]  Zhu-Hong You,et al.  ILNCSIM: improved lncRNA functional similarity calculation model , 2016, Oncotarget.

[6]  Zhu-Hong You,et al.  Increasing the reliability of protein-protein interaction networks via non-convex semantic embedding , 2013, Neurocomputing.

[7]  Xing Chen,et al.  FMLNCSIM: fuzzy measure-based lncRNA functional similarity calculation model , 2016, Oncotarget.

[8]  Yong Zhou,et al.  Prediction of Protein–Protein Interactions with Clustered Amino Acids and Weighted Sparse Representation , 2015, International journal of molecular sciences.

[9]  Shuai Li,et al.  Inverse-Free Extreme Learning Machine With Optimal Information Updating , 2016, IEEE Transactions on Cybernetics.

[10]  Hongli Chen,et al.  Medical Image Feature Extraction and Fusion Algorithm Based on K-SVD , 2014, 2014 Ninth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing.

[11]  Zhu-Hong You,et al.  An improved sequence-based prediction protocol for protein-protein interactions using amino acids substitution matrix and rotation forest ensemble classifiers , 2017, Neurocomputing.

[12]  Tara N. Sainath,et al.  Improving deep neural networks for LVCSR using rectified linear units and dropout , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Zhu-Hong You,et al.  Detecting Protein-Protein Interactions with a Novel Matrix-Based Protein Sequence Representation and Support Vector Machines , 2015, BioMed research international.

[14]  Xing Chen,et al.  MCMDA: Matrix completion for MiRNA-disease association prediction , 2017, Oncotarget.

[15]  Xing Chen,et al.  IRWRLDA: improved random walk with restart for lncRNA-disease association prediction , 2016, Oncotarget.

[16]  Zhu-Hong You,et al.  Predicting Protein-Protein Interactions from Primary Protein Sequences Using a Novel Multi-Scale Local Feature Representation Scheme and the Random Forest , 2015, PloS one.

[17]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[18]  Michael F. Lin,et al.  Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals , 2009, Nature.

[19]  Xiaoming Fan,et al.  Long non-coding RNA APTR promotes the activation of hepatic stellate cells and the progression of liver fibrosis. , 2015, Biochemical and biophysical research communications.

[20]  Zhen Ji,et al.  Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set , 2014, BMC Bioinformatics.

[21]  Hareton K. N. Leung,et al.  A Highly Efficient Approach to Protein Interactome Mapping Based on Collaborative Filtering Framework , 2015, Scientific Reports.

[22]  Xing Chen,et al.  Predicting protein-protein interactions from protein sequences by a stacked sparse autoencoder deep neural network. , 2017, Molecular bioSystems.

[23]  Tianwei Yu,et al.  K-Profiles: A Nonlinear Clustering Method for Pattern Detection in High Dimensional Data , 2015, BioMed research international.

[24]  Xing-Ming Zhao,et al.  Improved method for predicting phi-turns in proteins using a two-stage classifier. , 2010, Protein and peptide letters.

[25]  Zhu-Hong You,et al.  Using the Relevance Vector Machine Model Combined with Local Phase Quantization to Predict Protein-Protein Interactions from Protein Sequences , 2016, BioMed research international.

[26]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[27]  Xing Chen,et al.  Accurate prediction of protein-protein interactions by integrating potential evolutionary information embedded in PSSM profile and discriminative vector machine classifier , 2017, Oncotarget.

[28]  Asif Ekbal,et al.  Combining feature selection and classifier ensemble using a multiobjective simulated annealing approach: application to named entity recognition , 2012, Soft Computing.

[29]  Zhu-Hong You,et al.  Increasing reliability of protein interactome by fast manifold embedding , 2013, Pattern Recognit. Lett..

[30]  Xing Chen,et al.  Identification of self-interacting proteins by exploring evolutionary information embedded in PSI-BLAST-constructed position specific scoring matrix , 2016, Oncotarget.

[31]  Kara Dolinski,et al.  The BioGRID interaction database: 2015 update , 2014, Nucleic Acids Res..

[32]  Xuan Li,et al.  Association of tissue lineage and gene expression: conservatively and differentially expressed genes define common and special functions of tissues , 2010, BMC Bioinformatics.

[33]  Alessio Colantoni,et al.  Revealing protein–lncRNA interaction , 2015, Briefings Bioinform..

[34]  Zhu-Hong You,et al.  t-LSE: A Novel Robust Geometric Approach for Modeling Protein-Protein Interaction Networks , 2013, PloS one.

[35]  Hai-Cheng Yi,et al.  A Deep Learning Framework for Robust and Accurate Prediction of ncRNA-Protein Interactions Using Evolutionary Information , 2018, Molecular therapy. Nucleic acids.

[36]  Kuldip K. Paliwal,et al.  A Tri-Gram Based Feature Extraction Technique Using Linear Probabilities of Position Specific Scoring Matrix for Protein Fold Recognition , 2014, IEEE Transactions on NanoBioscience.

[37]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[38]  Yong Zhou,et al.  An improved efficient rotation forest algorithm to predict the interactions among proteins , 2018, Soft Comput..

[39]  Xing Chen,et al.  Sequence-based prediction of protein-protein interactions using weighted sparse representation model combined with global encoding , 2016, BMC Bioinformatics.

[40]  Xing Chen,et al.  PBMDA: A novel and effective path-based computational model for miRNA-disease association prediction , 2017, PLoS Comput. Biol..

[41]  Zhu-Hong You,et al.  An ensemble approach for large-scale identification of protein-protein interactions using the alignments of multiple sequences , 2016, Oncotarget.

[42]  Howard Y. Chang,et al.  Long noncoding RNAs and human disease. , 2011, Trends in cell biology.

[43]  Xing Chen,et al.  A Systematic Prediction of Drug-Target Interactions Using Molecular Fingerprints and Protein Sequences. , 2018, Current protein & peptide science.

[44]  M. Othman,et al.  Anaerobic Codigestion of Municipal Wastewater Treatment Plant Sludge with Food Waste: A Case Study , 2016, BioMed research international.

[45]  Xing Chen,et al.  Robust and accurate prediction of protein self-interactions from amino acids sequence using evolutionary information. , 2016, Molecular bioSystems.

[46]  Zhu-Hong You,et al.  Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data , 2010, Bioinform..

[47]  MengChu Zhou,et al.  Highly Efficient Framework for Predicting Interactions Between Proteins , 2017, IEEE Transactions on Cybernetics.

[48]  Jing Liu,et al.  A Two-Phase Multiobjective Evolutionary Algorithm for Enhancing the Robustness of Scale-Free Networks Against Multiple Malicious Attacks , 2017, IEEE Transactions on Cybernetics.

[49]  Zhen Ji,et al.  Large-Scale Protein-Protein Interactions Detection by Integrating Big Biosensing Data with Computational Model , 2014, BioMed research international.

[50]  Yin Wang,et al.  RVMAB: Using the Relevance Vector Machine Model Combined with Average Blocks to Predict the Interactions of Proteins from Protein Sequences , 2016, International journal of molecular sciences.

[51]  Xiaobo Zhou,et al.  A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network , 2010, BMC Bioinformatics.

[52]  Zhu-Hong You,et al.  RFDT: A Rotation Forest-based Predictor for Predicting Drug-Target Interactions Using Drug Structure and Protein Sequence Information. , 2016, Current protein & peptide science.

[53]  M. Guttman,et al.  Methods for comprehensive experimental identification of RNA-protein interactions , 2014, Genome Biology.

[54]  Hong-Bin Shen,et al.  IPMiner: hidden ncRNA-protein interaction sequential pattern mining with stacked autoencoder for accurate computational prediction , 2016, BMC Genomics.