Accurate prediction of protein-lncRNA interactions by diffusion and HeteSim features across heterogeneous network

BackgroundIdentifying the interactions between proteins and long non-coding RNAs (lncRNAs) is of great importance to decipher the functional mechanisms of lncRNAs. However, current experimental techniques for detection of lncRNA-protein interactions are limited and inefficient. Many methods have been proposed to predict protein-lncRNA interactions, but few studies make use of the topological information of heterogenous biological networks associated with the lncRNAs.ResultsIn this work, we propose a novel approach, PLIPCOM, using two groups of network features to detect protein-lncRNA interactions. In particular, diffusion features and HeteSim features are extracted from protein-lncRNA heterogenous network, and then combined to build the prediction model using the Gradient Tree Boosting (GTB) algorithm. Our study highlights that the topological features of the heterogeneous network are crucial for predicting protein-lncRNA interactions. The cross-validation experiments on the benchmark dataset show that PLIPCOM method substantially outperformed previous state-of-the-art approaches in predicting protein-lncRNA interactions. We also prove the robustness of the proposed method on three unbalanced data sets. Moreover, our case studies demonstrate that our method is effective and reliable in predicting the interactions between lncRNAs and proteins.AvailabilityThe source code and supporting files are publicly available at: http://denglab.org/PLIPCOM/.

[1]  Marta Lionetti,et al.  Long non-coding RNAs in normal and malignant hematopoiesis , 2016, Oncotarget.

[2]  Wei Wu,et al.  NONCODE 2016: an informative and valuable data source of long non-coding RNAs , 2015, Nucleic Acids Res..

[3]  Chantal Thys,et al.  GNAS defects identified by stimulatory G protein alpha-subunit signalling studies in platelets. , 2008, The Journal of clinical endocrinology and metabolism.

[4]  Jingpu Zhang,et al.  KATZLGO: Large-Scale Prediction of LncRNA Functions by Using the KATZ Measure Based on Multiple Networks , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Joanne E Curran,et al.  Polymorphic variants of NFKB1 and its inhibitory protein NFKBIA, and their involvement in sporadic breast cancer. , 2002, Cancer letters.

[6]  C. Ponting,et al.  Evolution and Functions of Long Noncoding RNAs , 2009, Cell.

[7]  Lei Deng,et al.  Accurate prediction of functional effects for variants by combining gradient tree boosting with optimal neighborhood properties , 2017, PloS one.

[8]  Vasant Honavar,et al.  Predicting RNA-Protein Interactions Using Only Sequence Information , 2011, BMC Bioinformatics.

[9]  M. Guttman,et al.  Methods for comprehensive experimental identification of RNA-protein interactions , 2014, Genome Biology.

[10]  S. Safe,et al.  HOTAIR IS A NEGATIVE PROGNOSTIC FACTOR AND EXHIBITS PRO-ONCOGENIC ACTIVITY IN PANCREATIC CANCER , 2012, Oncogene.

[11]  J. Friedman Stochastic gradient boosting , 2002 .

[12]  Lei Han,et al.  Long non-coding RNA HOTAIR promotes glioblastoma cell cycle progression in an EZH2 dependent manner , 2014, Oncotarget.

[13]  D. Landau,et al.  Determining the density of states for classical statistical models: a random walk algorithm to produce a flat histogram. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Alessio Colantoni,et al.  Revealing protein–lncRNA interaction , 2015, Briefings Bioinform..

[15]  Roded Sharan,et al.  Network-Based Integration of Disparate Omic Data To Identify "Silent Players" in Cancer , 2015, PLoS Comput. Biol..

[16]  J. Mattick,et al.  Structure and function of long noncoding RNAs in epigenetic regulation , 2013, Nature Structural &Molecular Biology.

[17]  Bonnie Berger,et al.  Exploiting ontology graph for predicting sparsely annotated gene function , 2015, Bioinform..

[18]  Michael P Mullen,et al.  DNA sequence polymorphisms within the bovine guanine nucleotide-binding protein Gs subunit alpha (Gsα)-encoding (GNAS) genomic imprinting domain are associated with performance traits , 2011, BMC Genetics.

[19]  Ao Li,et al.  Predicting Long Noncoding RNA and Protein Interactions Using Heterogeneous Network Model , 2015, BioMed research international.

[20]  Kengo Kinoshita,et al.  COXPRESdb in 2015: coexpression database for animal species by DNA-microarray and RNAseq-based expression data with multiple quality assessment systems , 2014, Nucleic Acids Res..

[21]  Xiangxiang Zeng,et al.  Inferring MicroRNA-Disease Associations by Random Walk on a Heterogeneous Network with Multiple Data Sources , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  Zhigang Chen,et al.  PredRSA: a gradient boosted regression trees approach for predicting protein solvent accessibility , 2016, BMC Bioinformatics.

[23]  Bonnie Berger,et al.  Diffusion Component Analysis: Unraveling Functional Topology in Biological Networks , 2015, RECOMB.

[24]  Jingpu Zhang,et al.  Integrating Multiple Heterogeneous Networks for Novel LncRNA-Disease Association Inference , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[25]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[26]  Quan Zou,et al.  Which statistical significance test best detects oncomiRNAs in cancer tissues? An exploratory analysis , 2016, Oncotarget.

[27]  Hyojin Kim,et al.  YeastNet v3: a public database of data-specific and integrated functional gene networks for Saccharomyces cerevisiae , 2013, Nucleic Acids Res..

[28]  Dayu Xiao,et al.  A Comprehensive Evaluation of Disease Phenotype Networks for Gene Prioritization , 2016, PloS one.

[29]  Wei Wu,et al.  NPInter v3.0: an upgraded database of noncoding RNA-associated interactions , 2016, Database J. Biol. Databases Curation.

[30]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[31]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[32]  Marinka Zitnik,et al.  Data Fusion by Matrix Factorization , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Suzanna Lewis,et al.  Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium , 2011, Briefings Bioinform..

[34]  Wei Huang,et al.  The lncRNA HOTAIRM1 regulates the degradation of PML-RARA oncoprotein and myeloid cell differentiation by enhancing the autophagy pathway , 2016, Cell Death and Differentiation.

[35]  Manolis Kellis,et al.  Evolutionary dynamics and tissue specificity of human long noncoding RNAs in six mammals , 2014, Genome research.

[36]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[37]  Zixiang Wang,et al.  Computational identification of binding energy hot spots in protein–RNA complexes using an ensemble approach , 2018, Bioinform..

[38]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[39]  Ahmad M Khalil,et al.  RNA-protein interactions in human health and disease. , 2011, Seminars in cell & developmental biology.

[40]  Timothy R. Hughes,et al.  High-throughput characterization of protein–RNA interactions , 2014, Briefings in functional genomics.

[41]  Gene H. Golub,et al.  Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.

[42]  Q. Zou,et al.  Similarity computation strategies in the microRNA-disease network: a survey. , 2015, Briefings in functional genomics.

[43]  Q. Zou,et al.  Prediction of MicroRNA-Disease Associations Based on Social Network Analysis Methods , 2015, BioMed research international.

[44]  David G. Knowles,et al.  The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression , 2012, Genome research.

[45]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[46]  Dmitri A. Nusinow,et al.  Xist RNA and the mechanism of X chromosome inactivation. , 2002, Annual review of genetics.

[47]  Xiang-Sun Zhang,et al.  De novo prediction of RNA-protein interactions from sequence information. , 2013, Molecular bioSystems.

[48]  Zixiang Wang,et al.  Ontological function annotation of long non‐coding RNAs through hierarchical multi‐label classification , 2018, Bioinform..

[49]  Jingpu Zhang,et al.  Prediction of lncRNA-protein interactions using HeteSim scores based on heterogeneous networks , 2017, Scientific Reports.

[50]  Philip S. Yu,et al.  HeteSim: A General Framework for Relevance Measure in Heterogeneous Networks , 2013, IEEE Transactions on Knowledge and Data Engineering.

[51]  Maode Lai,et al.  HOTAIRM1 as a potential biomarker for diagnosis of colorectal cancer functions the role in the tumour suppressor , 2016, Journal of cellular and molecular medicine.

[52]  Zixiang Wang,et al.  Combining diffusion and HeteSim features for accurate prediction of protein-lncRNA interactions , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[53]  Xuegong Zhang,et al.  Computational prediction of associations between long non-coding RNAs and proteins , 2013, BMC Genomics.

[54]  Yan Liu,et al.  EZH2 inhibition as a therapeutic strategy for lymphoma with EZH2-activating mutations , 2012, Nature.