Identifying and ranking potential cancer drivers using representation learning on attributed network.

Cancer can arise as a consequence of the accumulation of genomic alterations. Only a small part of driver mutations contributes to cancer development and progression. Hence, the identification of genes and alterations that serve as drivers for cancer development plays a critical role in drug design, cancer diagnoses and treatment. In this study, we propose a novel method to identify potential cancer drivers by using a Representation Learning method on Attributed Graphs (called RLAG). It is a first attempt to use both network structure and node attributes to learn feature representation for the genes in the network. Then it leverages these feature vectors to divide the genes into several subgroups. Finally, potential cancer driver genes are prioritized according to ranking scores that measure both genes' properties and their importance in the subgroups. We apply our method to predict driver genes for lung cancer, breast cancer and prostate cancer. The results show that our method outperforms the other three state-of-the-art methods in terms of Precision, Recall and F1-score values.

[1]  F. Supek,et al.  MUFFINN: cancer gene discovery via network analysis of somatic mutation data , 2016, Genome Biology.

[2]  Wei Peng,et al.  An Entropy-Based Method for Identifying Mutual Exclusive Driver Genes in Cancer , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Eli Upfal,et al.  Algorithms for Detecting Significantly Mutated Pathways in Cancer , 2010, RECOMB.

[4]  Xing-Ming Zhao,et al.  Identifying Disease Associated miRNAs Based on Protein Domains , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Xing-Ming Zhao,et al.  Integrative analysis of mutational and transcriptional profiles reveals driver mutations of metastatic breast cancers , 2016, Cell Discovery.

[6]  Gary D Bader,et al.  International network of cancer genome projects , 2010, Nature.

[7]  Feng Xu,et al.  A Brief Review of Network Embedding , 2019, Big Data Min. Anal..

[8]  Alberto Montresor,et al.  gat2vec: representation learning for attributed graphs , 2018, Computing.

[9]  S. Gabriel,et al.  Discovery and saturation analysis of cancer genes across 21 tumor types , 2014, Nature.

[10]  Ao Li,et al.  Discovering potential driver genes through an integrated model of somatic mutation profiles and gene functional information. , 2017, Molecular bioSystems.

[11]  Hong Yan,et al.  DrPOCS: Drug Repositioning Based on Projection Onto Convex Sets , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Mingming Jia,et al.  COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer , 2010, Nucleic Acids Res..

[13]  David Haussler,et al.  Discovering causal pathways linking genomic events to transcriptional states using Tied Diffusion Through Interacting Events (TieDIE) , 2013, Bioinform..

[14]  Lin Gao,et al.  Discovering potential cancer driver genes by an integrated network-based approach. , 2016, Molecular bioSystems.

[15]  Ke Zhang,et al.  Network representation based on the joint learning of three feature views , 2019, Big Data Min. Anal..

[16]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[17]  Xing-Ming Zhao,et al.  Predicting drug-disease associations with heterogeneous network embedding. , 2019, Chaos.

[18]  Feng Wang,et al.  A random walk-based method to identify driver genes by integrating the subcellular localization and variation frequency into bipartite graph , 2019, BMC Bioinformatics.

[19]  Christian Stolte,et al.  COMPARTMENTS: unification and visualization of protein subcellular localization evidence , 2014, Database J. Biol. Databases Curation.

[20]  Shi-Hua Zhang,et al.  Sparse Deep Nonnegative Matrix Factorization , 2017, Big Data Min. Anal..

[21]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[22]  Santhilata Kuppili Venkata,et al.  The Network of Cancer Genes (NCG): a comprehensive catalogue of known and candidate cancer genes from cancer sequencing screens , 2018, Genome Biology.

[23]  Benjamin J. Raphael,et al.  De novo discovery of mutated driver pathways in cancer , 2011 .

[24]  Wei Peng,et al.  Network Embedding the Protein–Protein Interaction Network for Human Essential Genes Identification , 2020, Genes.

[25]  Steven A. Roberts,et al.  Mutational heterogeneity in cancer and the search for new cancer genes , 2014 .

[26]  L. Stein,et al.  A human functional protein interaction network and its application to cancer data analysis , 2010, Genome Biology.

[27]  Zhongming Zhao,et al.  Advances in computational approaches for prioritizing driver mutations and significantly mutated genes in cancer genomes , 2016, Briefings Bioinform..