Multivariate Information Fusion With Fast Kernel Learning to Kernel Ridge Regression in Predicting LncRNA-Protein Interactions

Long non-coding RNAs (lncRNAs) constitute a large class of transcribed RNA molecules. They have a characteristic length of more than 200 nucleotides which do not encode proteins. They play an important role in regulating gene expression by interacting with the homologous RNA-binding proteins. Due to the laborious and time-consuming nature of wet experimental methods, more researchers should pay great attention to computational approaches for the prediction of lncRNA-protein interaction (LPI). An in-depth literature review in the state-of-the-art in silico investigations, leads to the conclusion that there is still room for improving the accuracy and velocity. This paper propose a novel method for identifying LPI by employing Kernel Ridge Regression, based on Fast Kernel Learning (LPI-FKLKRR). This approach, uses four distinct similarity measures for lncRNA and protein space, respectively. It is remarkable, that we extract Gene Ontology (GO) with proteins, in order to improve the quality of information in protein space. The process of heterogeneous kernels integration, applies Fast Kernel Learning (FastKL) to deal with weight optimization. The extrapolation model is obtained by gaining the ultimate prediction associations, after using Kernel Ridge Regression (KRR). Experimental outcomes show that the ability of modeling with LPI-FKLKRR has extraordinary performance compared with LPI prediction schemes. On benchmark dataset, it has been observed that the best Area Under Precision Recall Curve (AUPR) of 0.6950 is obtained by our proposed model LPI-FKLKRR, which outperforms the integrated LPLNP (AUPR: 0.4584), RWR (AUPR: 0.2827), CF (AUPR: 0.2357), LPIHN (AUPR: 0.2299), and LPBNI (AUPR: 0.3302). Also, combined with the experimental results of a case study on a novel dataset, it is anticipated that LPI-FKLKRR will be a useful tool for LPI prediction.

[1]  Hongyu Zhang,et al.  High-Throughput Sequencing to Reveal Genes Involved in Reproduction and Development in Bactrocera dorsalis (Diptera: Tephritidae) , 2012, PloS one.

[2]  Xiaomei Wu,et al.  Improving the Measurement of Semantic Similarity between Gene Ontology Terms and Gene Products: Insights from an Edge- and IC-Based Hybrid Method , 2013, PloS one.

[3]  Bernard De Baets,et al.  Efficient Pairwise Learning Using Kernel Ridge Regression: an Exact Two-Step Method , 2016, ArXiv.

[4]  Reza Modarres,et al.  An efficient concordant integrative analysis of multiple large-scale two-sample expression data sets , 2017, Bioinform..

[5]  Q. Zou,et al.  Prediction of MicroRNA-Disease Associations Based on Social Network Analysis Methods , 2015, BioMed research international.

[6]  Xiaobo Zhou,et al.  Semi-supervised drug-protein interaction prediction from heterogeneous biological spaces , 2010, BMC Systems Biology.

[7]  Ivan G. Costa,et al.  A multiple kernel learning algorithm for drug-target interaction prediction , 2016, BMC Bioinformatics.

[8]  Xuegong Zhang,et al.  Computational prediction of associations between long non-coding RNAs and proteins , 2013, BMC Genomics.

[9]  Juwen Shen,et al.  Predicting protein–protein interactions based only on sequences information , 2007, Proceedings of the National Academy of Sciences.

[10]  Chunyan Miao,et al.  Neighborhood Regularized Logistic Matrix Factorization for Drug-Target Interaction Prediction , 2016, PLoS Comput. Biol..

[11]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[12]  Hao Ding,et al.  Collaborative matrix factorization with multiple similarities for predicting drug-target interactions , 2013, KDD.

[13]  Wen Zhang,et al.  The linear neighborhood propagation method for predicting long non-coding RNA-protein interactions , 2018, Neurocomputing.

[14]  Paolo Romano,et al.  Geena 2, improved automated analysis of MALDI/TOF mass spectra , 2016, BMC Bioinformatics.

[15]  Jianxin Wang,et al.  A novel method of predicting microRNA-disease associations based on microRNA, disease, gene and environment factor networks. , 2017, Methods.

[16]  Feng Shi,et al.  Support vector machine method on predicting resistance gene against Xanthomonas oryzae pv. oryzae in rice , 2010, Expert Syst. Appl..

[17]  Qi Zhao,et al.  LPI-ETSLP: lncRNA-protein interaction prediction using eigenvalue transformation-based semi-supervised link prediction. , 2017, Molecular bioSystems.

[18]  Z. Xuan,et al.  Long Non-Coding RNAs and Complex Human Diseases , 2013, International journal of molecular sciences.

[19]  C. Chothia,et al.  Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. , 2001, Journal of molecular biology.

[20]  Wei Wu,et al.  NONCODEv4: exploring the world of long non-coding RNA genes , 2013, Nucleic Acids Res..

[21]  Chee-Keong Kwoh,et al.  Computational prediction of drug-target interactions using chemogenomic approaches: an empirical survey , 2019, Briefings Bioinform..

[22]  Jian Song,et al.  Identification of DNA–protein Binding Sites through Multi-Scale Local Average Blocks on Sequence Information , 2017, Molecules.

[23]  William Stafford Noble,et al.  Support vector machine , 2013 .

[24]  Minghong Jiang,et al.  Self-Recognition of an Inducible Host lncRNA by RIG-I Feedback Restricts Innate Immune Response , 2018, Cell.

[25]  J. Rinn,et al.  Modular regulatory principles of large non-coding RNAs , 2012, Nature.

[26]  Jijun Tang,et al.  Identification of Protein-Ligand Binding Sites by Sequence Information and Ensemble Classifier , 2017, J. Chem. Inf. Model..

[27]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[28]  Deqiang Zhang,et al.  Exploring the Secrets of Long Noncoding RNAs , 2015, International Journal of Molecular Sciences.

[29]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[30]  Ao Li,et al.  A Bipartite Network-based Method for Prediction of Long Non-coding RNA–protein Interactions , 2016, Genom. Proteom. Bioinform..

[31]  Zunxi Huang,et al.  Enhancing thermal tolerance of Aspergillus niger PhyA phytase directed by structural comparison and computational simulation , 2018, BMC Biotechnology.

[32]  E. Marchiori,et al.  Predicting Drug-Target Interactions for New Drug Compounds Using a Weighted Nearest Neighbor Profile , 2013, PloS one.

[33]  Kuo-Chen Chou,et al.  MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. , 2007, Biochemical and biophysical research communications.

[34]  Xiangrong Liu,et al.  An Empirical Study of Features Fusion Techniques for Protein-Protein Interaction Prediction , 2016 .

[35]  Xiang-Sun Zhang,et al.  De novo prediction of RNA-protein interactions from sequence information. , 2013, Molecular bioSystems.

[36]  S. Kung,et al.  GOASVM: a subcellular location predictor by incorporating term-frequency gene ontology into the general form of Chou's pseudo-amino acid composition. , 2013, Journal of theoretical biology.

[37]  Sanghyuk Lee,et al.  lncRNAtor: a comprehensive resource for functional investigation of long non-coding RNAs , 2014, Bioinform..

[38]  Yang Wang,et al.  Fusing multiple protein-protein similarity networks to effectively predict lncRNA-protein interactions , 2017, BMC Bioinformatics.

[39]  Wei Wu,et al.  NPInter v2.0: an updated database of ncRNA interactions , 2013, Nucleic Acids Res..

[40]  V. Suresh,et al.  RPI-Pred: predicting ncRNA-protein interaction using sequence and structural information , 2015, Nucleic acids research.

[41]  Mingxin Gan,et al.  Walking on a User Similarity Network towards Personalized Recommendations , 2014, PloS one.

[42]  Ao Li,et al.  Predicting Long Noncoding RNA and Protein Interactions Using Heterogeneous Network Model , 2015, BioMed research international.

[43]  Shruti Kapoor,et al.  Computational approaches towards understanding human long non-coding RNA biology , 2015, Bioinform..

[44]  Vasant Honavar,et al.  Predicting RNA-Protein Interactions Using Only Sequence Information , 2011, BMC Bioinformatics.

[45]  Xinying Xu,et al.  An Ameliorated Prediction of Drug–Target Interactions Based on Multi-Scale Discrete Wavelet Transform and Network Features , 2017, International journal of molecular sciences.

[46]  Qi Zhao,et al.  IRWNRLPI: Integrating Random Walk and Neighborhood Regularized Logistic Matrix Factorization for lncRNA-Protein Interaction Prediction , 2018, Front. Genet..

[47]  P. Kapranov,et al.  The Landscape of long noncoding RNA classification. , 2015, Trends in genetics : TIG.

[48]  Tong Wang,et al.  A Novel Method , 2020, ArXiv.

[49]  Shih-Fu Chang,et al.  Fast kernel learning for spatial pyramid matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Elena Marchiori,et al.  Gaussian interaction profile kernels for predicting drug-target interaction , 2011, Bioinform..

[51]  Yi Pan,et al.  DNRLMF-MDA:Predicting microRNA-Disease Associations Based on Similarities of microRNAs and Diseases , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[52]  Qi Zhao,et al.  Identifying and Exploiting Potential miRNA-Disease Associations With Neighborhood Regularized Logistic Matrix Factorization , 2018, Front. Genet..

[53]  Federico Agostini,et al.  Predicting protein associations with long noncoding RNAs , 2011, Nature Methods.

[54]  Quan Zou,et al.  Computational Analysis of miRNA Target Identification , 2012 .

[55]  Jijun Tang,et al.  Identification of Protein–Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information , 2016, International journal of molecular sciences.

[56]  Yoshihiro Yamanishi,et al.  Prediction of drug–target interaction networks from the integration of chemical and genomic spaces , 2008, ISMB.