Robust Inductive Matrix Completion Strategy to Explore Associations Between LincRNAs and Human Disease Phenotypes

Over the past few years, it has been established that a number of long intergenic non-coding RNAs (lincRNAs) are linked to a wide variety of human diseases. The relationship among many other lincRNAs still remains as puzzle. Validation of such link between the two entities through biological experiments is expensive. However, piles of information about the two are becoming available, thanks to the High Throughput Sequencing (HTS) platforms, Genome Wide Association Studies (GWAS), etc., thereby opening opportunity for cutting-edge machine learning and data mining approaches. However, there are only a few <italic>in silico</italic> lincRNA-disease association inference tools available to date, and none of these utilizes side information of both the entities. The recently developed Inductive Matrix Completion (IMC) technique provides a recommendation platform among two entities considering respective side information. But, the formulation of IMC is incapable of handling noise and outliers that may present in the dataset, while data sparsity consideration is another issue with the standard IMC method. Thus, a robust version of IMC is needed that can solve these two issues. As a remedy, in this paper, we propose Robust Inductive Matrix Completion (RIMC) using <inline-formula><tex-math notation="LaTeX">$\ell _{2,1}$</tex-math><alternatives><mml:math><mml:msub><mml:mi>ℓ</mml:mi><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math><inline-graphic xlink:href="gao-ieq1-2844816.gif"/></alternatives></inline-formula> norm loss function as well as <inline-formula><tex-math notation="LaTeX">$\ell _{2,1}$</tex-math><alternatives><mml:math><mml:msub><mml:mi>ℓ</mml:mi><mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math><inline-graphic xlink:href="gao-ieq2-2844816.gif"/></alternatives></inline-formula> norm based regularization. We applied RIMC to the available association data between human lincRNAs and OMIM disease phenotypes as well as a diverse set of side information about the lincRNAs and the diseases. Our method performs better than the state-of-the-art methods in terms of <inline-formula><tex-math notation="LaTeX">$precision@k$</tex-math><alternatives><mml:math><mml:mrow><mml:mi>p</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>@</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="gao-ieq3-2844816.gif"/></alternatives></inline-formula> and <inline-formula><tex-math notation="LaTeX">$recall@k$</tex-math><alternatives><mml:math><mml:mrow><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi><mml:mo>@</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="gao-ieq4-2844816.gif"/></alternatives></inline-formula> at the top-<inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives><mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="gao-ieq5-2844816.gif"/></alternatives></inline-formula> disease prioritization to the subject lincRNAs. We also demonstrate that RIMC is equally effective for querying about novel lincRNAs, as well as predicting rank of a newly known disease for a set of well-characterized lincRNAs. <italic>Availability</italic>: All the supporting datasets are available at the publicly accessible URL located at <uri>http://biomecis.uta.edu/~ashis/res/RIMC/</uri>.

[1]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[2]  An-Yuan Guo,et al.  lncRNASNP: a database of SNPs in lncRNAs and their potential functions in human and mouse , 2014, Nucleic Acids Res..

[3]  Min Li,et al.  Heterogeneous Network Model to Infer Human Disease-Long Intergenic Non-Coding RNA Associations , 2015, IEEE Transactions on NanoBioscience.

[4]  Jingpu Zhang,et al.  Integrating Multiple Heterogeneous Networks for Novel LncRNA-Disease Association Inference , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Yixue Li,et al.  Global Prioritizing Disease Candidate lncRNAs via a Multi-level Composite Network , 2017, Scientific Reports.

[6]  Tapio Salakoski,et al.  An expanded evaluation of protein function prediction methods shows an improvement in accuracy , 2016, Genome Biology.

[7]  Inderjit S. Dhillon,et al.  Provable Inductive Matrix Completion , 2013, ArXiv.

[8]  Q. Cui,et al.  LncDisease: a sequence based bioinformatics tool for predicting lncRNA-disease associations , 2016, Nucleic acids research.

[9]  Ke Liu,et al.  Linc2GO: a human LincRNA function annotation resource based on ceRNA hypothesis , 2013, Bioinform..

[10]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[11]  Xing Chen KATZLDA: KATZ measure for the lncRNA-disease association prediction , 2015, Scientific Reports.

[12]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[13]  Cole Trapnell,et al.  Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. , 2011, Genes & development.

[14]  Xiaojiao Zhang,et al.  The identification of an ESCC susceptibility SNP rs920778 that regulates the expression of lncRNA HOTAIR via a novel intronic enhancer. , 2014, Carcinogenesis.

[15]  Giuliano Armano,et al.  RANKS: a flexible tool for node label ranking and classification in biological networks , 2016, Bioinform..

[16]  Alfonso E. Romero,et al.  A network medicine approach to quantify distance between hereditary disease modules on the interactome , 2015, Scientific Reports.

[17]  Chris H. Q. Ding,et al.  Towards Structural Sparsity: An Explicit l2/l0 Approach , 2010, ICDM.

[18]  Huan Liu,et al.  Content-Aware Point of Interest Recommendation on Location-Based Social Networks , 2015, AAAI.

[19]  Xing Xie,et al.  GeoMF: joint geographical modeling and matrix factorization for point-of-interest recommendation , 2014, KDD.

[20]  Ivan Molineris,et al.  In silico prediction of lncRNA function using tissue specific and evolutionary conserved expression , 2017, BMC Bioinformatics.

[21]  M. Gerstein,et al.  Annotating non-coding regions of the genome , 2010, Nature Reviews Genetics.

[22]  Jagdish Chandra Patra,et al.  Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network , 2010, Bioinform..

[23]  Hui Zhou,et al.  ChIPBase: a database for decoding the transcriptional regulation of long non-coding RNA and microRNA genes from ChIP-Seq data , 2012, Nucleic Acids Res..

[24]  Xing Chen,et al.  LncRNADisease: a database for long-non-coding RNA-associated diseases , 2012, Nucleic Acids Res..

[25]  Nanning Zheng,et al.  Nonnegative matrix factorization and its applications in pattern recognition , 2006 .

[26]  M. Esteller Non-coding RNAs in human disease , 2011, Nature Reviews Genetics.

[27]  Xing Chen,et al.  Novel human lncRNA-disease association inference based on lncRNA expression profiles , 2013, Bioinform..

[28]  P. Pandolfi,et al.  A ceRNA Hypothesis: The Rosetta Stone of a Hidden RNA Language? , 2011, Cell.

[29]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[30]  Xing Chen,et al.  A Computational Framework to Infer Human Disease-Associated Long Noncoding RNAs , 2014, PloS one.

[31]  Chris H. Q. Ding,et al.  Inferring disease associations of the long non-coding RNAs through non-negative matrix factorization , 2015, Network Modeling Analysis in Health Informatics and Bioinformatics.

[32]  Ashis Kumer Biswas,et al.  NMF-Based LncRNA-Disease Association Inference and Bi-Clustering , 2014, 2014 IEEE International Conference on Bioinformatics and Bioengineering.