An Entropy-Based Method for Identifying Mutual Exclusive Driver Genes in Cancer

Cancer in essence is a complex genomic alteration disease which is caused by the somatic mutations during the lifetime. According to previous researches, the first step to overcome cancer is to identify driver genes which can promote carcinogenesis. However, it is still a big challenge to precisely and efficiently extract the cancer related driver genes because the nature of cancer is heterogeneous and there exists tremendously irrelevant passenger mutations which have no function impact on the cancer's development. In this work, we proposed a novel entropy-based method namely EntroRank to identify driver genes by integrating the subcellular localization information and mutual exclusive of variation frequency into the network. EntroRank can take into full consideration different properties of driver genes. Considering the modularity of driver genes, the mutated genes in the network were first clustered into different subgroups according to their located compartments. After that, the structural entropy of the gene in the subgroup was employed to measure its indispensability. Considering mutual exclusive property between driver genes in the modules, relative entropy was utilized to measure the degree of mutual exclusive between two mutated genes in terms of their variation frequency. We applied our method to three different cancers including lung, prostate, and breast cancer. The results show our method not only detect the well-known important drivers but also prioritiz the rare unknown driver genes. Besides, EntroRank can identify driver genes having mutual exclusive property. Compared with other existing methods, our method achieves a better performance for most of cancer types in terms of Precision, Recall, and Fscore.

[1]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[2]  Christopher A. Miller,et al.  Discovering functional modules by identifying recurrent and mutually exclusive mutational patterns in tumors , 2011, BMC Medical Genomics.

[3]  Christian Stolte,et al.  COMPARTMENTS: unification and visualization of protein subcellular localization evidence , 2014, Database J. Biol. Databases Curation.

[4]  Eli Upfal,et al.  Algorithms for Detecting Significantly Mutated Pathways in Cancer , 2010, RECOMB.

[5]  Xuejun Yang,et al.  Predicting diabetes mellitus genes via protein-protein interaction and protein subcellular localization information , 2016, BMC Genomics.

[6]  Matthew B. Callaway,et al.  MuSiC: Identifying mutational significance in cancer genomes , 2012, Genome research.

[7]  A. Bashashati,et al.  DriverNet: uncovering the impact of somatic driver mutations on transcriptional networks in cancer , 2012, Genome Biology.

[8]  Nan Qiao,et al.  CoCiter: An Efficient Tool to Infer Gene Function by Assessing the Significance of Literature Co-Citation , 2013, PloS one.

[9]  W. Hahn,et al.  Modelling the molecular circuitry of cancer , 2002, Nature Reviews Cancer.

[10]  Arasambattu Kannan Munirajan,et al.  Oncogenic mutations of the PIK3CA gene in head and neck squamous cell carcinomas. , 2008, International journal of oncology.

[11]  Gary D Bader,et al.  Comprehensive identification of mutational cancer driver genes across 12 tumor types , 2013, Scientific Reports.

[12]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[13]  L. Stein,et al.  A human functional protein interaction network and its application to cancer data analysis , 2010, Genome Biology.

[14]  Andrew M. Gross,et al.  Network-based stratification of tumor mutations , 2013, Nature Methods.

[15]  Junhua Zhang,et al.  The Discovery of Mutated Driver Pathways in Cancer: Models and Algorithms , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  C. Sander,et al.  Mutual exclusivity analysis identifies oncogenic network modules. , 2012, Genome research.

[17]  Junfeng Xia,et al.  LNDriver: identifying driver genes by integrating mutation and expression data based on gene-gene interaction network , 2016, BMC Bioinformatics.

[18]  F. Supek,et al.  MUFFINN: cancer gene discovery via network analysis of somatic mutation data , 2016, Genome Biology.

[19]  Zhongming Zhao,et al.  Studying tumorigenesis through network evolution and somatic mutational perturbations in the cancer interactome. , 2014, Molecular biology and evolution.

[20]  S. Gabriel,et al.  Discovery and saturation analysis of cancer genes across 21 tumor types , 2014, Nature.

[21]  Lit-Hsin Loo,et al.  Quantitative Protein Localization Signatures Reveal an Association between Spatial and Functional Divergences of Proteins , 2014, PLoS Comput. Biol..

[22]  Zhongming Zhao,et al.  Advances in computational approaches for prioritizing driver mutations and significantly mutated genes in cancer genomes , 2016, Briefings Bioinform..

[23]  J. P. Hou,et al.  DawnRank: discovering personalized driver genes in cancer , 2014, Genome Medicine.

[24]  Qingxia Chen,et al.  MSEA: detection and quantification of mutation hotspots through mutation set enrichment analysis , 2014, Genome Biology.

[25]  E. Wang,et al.  Genetic studies of diseases , 2007, Cellular and Molecular Life Sciences.

[26]  Steven A. Roberts,et al.  Mutational heterogeneity in cancer and the search for new cancer-associated genes , 2013 .

[27]  Gary D Bader,et al.  International network of cancer genome projects , 2010, Nature.

[28]  Peilin Jia,et al.  VarWalker: Personalized Mutation Network Analysis of Putative Cancer Genes from Next-Generation Sequencing Data , 2014, PLoS Comput. Biol..

[29]  Lin Gao,et al.  Discovering potential cancer driver genes by an integrated network-based approach. , 2016, Molecular bioSystems.

[30]  Luigi Marchionni,et al.  Wnt signaling though beta-catenin is required for prostate lineage specification. , 2012, Developmental biology.

[31]  Lusheng Wang,et al.  Predicting Protein Functions by Using Unbalanced Random Walk Algorithm on Three Biological Networks , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[32]  Brian H. Dunford-Shore,et al.  Somatic mutations affect key pathways in lung adenocarcinoma , 2008, Nature.

[33]  David Haussler,et al.  Discovering causal pathways linking genomic events to transcriptional states using Tied Diffusion Through Interacting Events (TieDIE) , 2013, Bioinform..

[34]  Ling Lin,et al.  PathScan: a tool for discerning mutational significance in groups of putative cancer genes , 2011, Bioinform..

[35]  Rune Linding,et al.  Navigating cancer network attractors for tumor-specific therapy , 2012, Nature Biotechnology.

[36]  Yadong Wang,et al.  A novel method to measure the semantic similarity of HPO terms , 2017, Int. J. Data Min. Bioinform..

[37]  Santhilata Kuppili Venkata,et al.  The Network of Cancer Genes (NCG): a comprehensive catalogue of known and candidate cancer genes from cancer sequencing screens , 2018, Genome Biology.

[38]  Steven A. Roberts,et al.  Mutational heterogeneity in cancer and the search for new cancer genes , 2014 .

[39]  Eli Upfal,et al.  De Novo Discovery of Mutated Driver Pathways in Cancer , 2011, RECOMB.

[40]  Richard Simon,et al.  Identifying cancer driver genes in tumor genome sequencing studies , 2011, Bioinform..

[41]  Yi Pan,et al.  Rechecking the Centrality-Lethality Rule in the Scope of Protein Subcellular Localization Interaction Networks , 2015, PloS one.