KATZLGO: Large-Scale Prediction of LncRNA Functions by Using the KATZ Measure Based on Multiple Networks

Aggregating evidences have shown that long non-coding RNAs (lncRNAs) generally play key roles in cellular biological processes such as epigenetic regulation, gene expression regulation at transcriptional and post-transcriptional levels, cell differentiation, and others. However, most lncRNAs have not been functionally characterized. There is an urgent need to develop computational approaches for function annotation of increasing available lncRNAs. In this article, we propose a global network-based method, KATZLGO, to predict the functions of human lncRNAs at large scale. A global network is constructed by integrating three heterogeneous networks: lncRNA-lncRNA similarity network, lncRNA-protein association network, and protein-protein interaction network. The KATZ measure is then employed to calculate similarities between lncRNAs and proteins in the global network. We annotate lncRNAs with Gene Ontology (GO) terms of their neighboring protein-coding genes based on the KATZ similarity scores. The performance of KATZLGO is evaluated on a manually annotated lncRNA benchmark and a protein-coding gene benchmark with known function annotations. KATZLGO significantly outperforms state-of-the-art computational method both in maximum F-measure and coverage. Furthermore, we apply KATZLGO to predict functions of human lncRNAs and successfully map 12,318 human lncRNA genes to GO terms.

[1]  Zhigang Chen,et al.  PredHS: a web server for predicting protein–protein interaction hot spots by using structural neighborhood properties , 2014, Nucleic Acids Res..

[2]  Kengo Kinoshita,et al.  COXPRESdb in 2015: coexpression database for animal species by DNA-microarray and RNAseq-based expression data with multiple quality assessment systems , 2014, Nucleic Acids Res..

[3]  Howard Y. Chang,et al.  Functional Demarcation of Active and Silent Chromatin Domains in Human HOX Loci by Noncoding RNAs , 2007, Cell.

[4]  Rong Yin,et al.  CCAT2 is a lung adenocarcinoma-specific long non-coding RNA and promotes invasion of non-small cell lung cancer , 2014, Tumor Biology.

[5]  S. Miyano,et al.  Long noncoding RNA HOTAIR regulates polycomb-dependent chromatin modification and is associated with poor prognosis in colorectal cancers. , 2011, Cancer research.

[6]  C. Ponting,et al.  Evolution and Functions of Long Noncoding RNAs , 2009, Cell.

[7]  A. Kibel Multiple newly identified loci associated with prostate cancer susceptibility , 2009 .

[8]  Cole Trapnell,et al.  Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. , 2011, Genes & development.

[9]  Michael F. Lin,et al.  Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals , 2009, Nature.

[10]  Howard Y. Chang,et al.  Long Noncoding RNA as Modular Scaffold of Histone Modification Complexes , 2010, Science.

[11]  K. Morris,et al.  Evolutionary conservation of long non-coding RNAs; sequence, structure, function. , 2014, Biochimica et biophysica acta.

[12]  Yi Zhao,et al.  NONCODE: an integrated knowledge database of non-coding RNAs , 2004, Nucleic Acids Res..

[13]  David Reich,et al.  A common genetic risk factor for colorectal and prostate cancer , 2007, Nature Genetics.

[14]  John L Hopper,et al.  Multiple loci with different cancer specificities within the 8q24 gene desert. , 2008, Journal of the National Cancer Institute.

[15]  Alessio Colantoni,et al.  Revealing protein–lncRNA interaction , 2015, Briefings Bioinform..

[16]  Juanjuan Zhu,et al.  Function of lncRNAs and approaches to lncRNA-protein interactions , 2013, Science China Life Sciences.

[17]  Shuli Kang,et al.  Large-scale prediction of long non-coding RNA functions in a coding–non-coding gene co-expression network , 2011, Nucleic acids research.

[18]  Dennis B. Troup,et al.  NCBI GEO: mining tens of millions of expression profiles—database and tools update , 2006, Nucleic Acids Res..

[19]  Wei Tang,et al.  Supervised Link Prediction Using Multiple Sources , 2010, 2010 IEEE International Conference on Data Mining.

[20]  Xing Chen KATZLDA: KATZ measure for the lncRNA-disease association prediction , 2015, Scientific Reports.

[21]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[22]  Howard Y. Chang,et al.  Long noncoding RNA HOTAIR reprograms chromatin state to promote cancer metastasis , 2010, Nature.

[23]  Y. Kamatani,et al.  Common variant in 6q26-q27 is associated with distal colon cancer in an Asian population , 2011, Gut.

[24]  J. Foekens,et al.  CCAT2, a novel noncoding RNA mapping to 8q24, underlies metastatic progression and chromosomal instability in colon cancer , 2013, Genome research.

[25]  J. Mattick,et al.  Long non-coding RNAs: insights into functions , 2009, Nature Reviews Genetics.

[26]  Zhigang Chen,et al.  An Integrated Framework for Functional Annotation of Protein Structural Domains , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[27]  P. Schirmacher,et al.  Loss of the abundant nuclear non-coding RNA MALAT1 is compatible with life and development , 2012, RNA biology.

[28]  Katarzyna Musial,et al.  Link Prediction Methods and Their Accuracy for Different Social Networks and Network Metrics , 2015, Sci. Program..

[29]  José Ignacio Garzón,et al.  Template-based prediction of protein function. , 2015, Current opinion in structural biology.

[30]  Wei Wu,et al.  NPInter v3.0: an upgraded database of noncoding RNA-associated interactions , 2016, Database J. Biol. Databases Curation.

[31]  Jiajie Peng,et al.  LncRNA2Function: a comprehensive resource for functional investigation of human lncRNAs based on RNA-seq data , 2015, BMC Genomics.

[32]  Gene Ontology Consortium,et al.  The Gene Ontology (GO) project in 2006 , 2005, Nucleic Acids Res..

[33]  David G. Knowles,et al.  The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression , 2012, Genome research.

[34]  Howard Y. Chang,et al.  Long noncoding RNAs and human disease. , 2011, Trends in cell biology.

[35]  Sergio Contrino,et al.  ArrayExpress—a public repository for microarray gene expression data at the EBI , 2004, Nucleic Acids Res..

[36]  Manolis Kellis,et al.  Evolutionary dynamics and tissue specificity of human long noncoding RNAs in six mammals , 2014, Genome research.

[37]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[38]  W. Willett,et al.  Multiple loci identified in a genome-wide association study of prostate cancer , 2008, Nature Genetics.

[39]  John O. Woods,et al.  Correction: Prediction and Validation of Gene-Disease Associations Using Methods Inspired by Social Network Analyses , 2013, PLoS ONE.

[40]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[41]  Xiaoke Ma,et al.  Long non-coding RNAs function annotation: a global prediction method based on bi-colored networks , 2012, Nucleic acids research.

[42]  Jeannie T. Lee Epigenetic Regulation by Long Noncoding RNAs , 2012, Science.

[43]  Daniel Birnbaum,et al.  8q24 Cancer Risk Allele Associated with Major Metastatic Risk in Inflammatory Breast Cancer , 2012, PloS one.

[44]  Simon G. Coetzee,et al.  Identification of Genetic Susceptibility Loci for Colorectal Tumors in a Genome-Wide Meta-analysis. , 2013, Gastroenterology.

[45]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .