Definition of a New Metric With Mutual Exclusivity and Coverage for Identifying Cancer Driver Modules

Identification of cancer driver modules or pathways is a key step in understanding cancer pathogenesis and exploring patient-specific treatments. Numerous studies have shown that some genes with low mutation frequency are also important for the cancer progression, while previous research have focused on identifying high-frequency mutation genes. In this study, we propose a new framework with a new metric to identify driver modules with low-frequency mutation genes, called iCDModule. Inspired by the gravity model, we integrate the coverage and mutual exclusivity in mutation information, define a new metric between gene pairs, called mutation impact distance, to help identifying potential driver genes sets, including those have extremely low mutation rates but play an important role in functional networks. A genetic network is constructed by combining the defined mutation impact distance and then the driver module identification problem is formalized as the maximum clique solution problem, and an improved ant colony optimization algorithm is used to solve it. iCDModule is applied to TCGA breast cancer, glioblastoma, ovarian cancer to test performance. Experiments show that it can accurately identify known cancer driver modules and pathways, and also detect driver modules containing low-frequency mutation genes. iCDModule is significantly better than other existing methods in identifying driver modules.

[1]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[2]  Lin Gao,et al.  Detection of Driver Modules with Rarely Mutated Genes in Cancers , 2020, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Marta C. González,et al.  A universal model for mobility and migration patterns , 2011, Nature.

[4]  J. Klijn,et al.  Effectiveness of breast cancer surveillance in BRCA1/2 gene mutation carriers and women with high familial risk. , 2001, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[5]  Lei Wang,et al.  An Effective Graph Clustering Method to Identify Cancer Driver Modules , 2020, Frontiers in Bioengineering and Biotechnology.

[6]  S. Elledge,et al.  Cumulative Haploinsufficiency and Triplosensitivity Drive Aneuploidy Patterns and Shape the Cancer Genome , 2013, Cell.

[7]  Xinhua Peng,et al.  Quantum speedup in solving the maximal-clique problem , 2018, 1803.11356.

[8]  C. Sander,et al.  Mutual exclusivity analysis identifies oncogenic network modules. , 2012, Genome research.

[9]  Yann Joly,et al.  Data Sharing in the Post-Genomic World: The Experience of the International Cancer Genome Consortium (ICGC) Data Access Compliance Office (DACO) , 2012, PLoS Comput. Biol..

[10]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2005, Nucleic Acids Res..

[11]  Wei Zhang,et al.  An Integrated Framework for Identifying Mutated Driver Pathway and Cancer Progression , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Teresa M. Przytycka,et al.  MEMCover: integrated analysis of mutual exclusivity and functional network reveals dysregulated pathways across multiple cancer types , 2015, Bioinform..

[13]  Ao Li,et al.  Discovering potential driver genes through an integrated model of somatic mutation profiles and gene functional information. , 2017, Molecular bioSystems.

[14]  Shu-Lin Wang,et al.  An efficient strategy for identifying cancer-related key genes based on graph entropy , 2018, Comput. Biol. Chem..

[15]  Martin H. Schaefer,et al.  HIPPIE v2.0: enhancing meaningfulness and reliability of protein–protein interaction networks , 2016, Nucleic Acids Res..

[16]  Shu-Lin Wang,et al.  A Novel Method for Identifying the Potential Cancer Driver Genes Based on Molecular Data Integration , 2019, Biochemical Genetics.

[17]  Eli Upfal,et al.  De Novo Discovery of Mutated Driver Pathways in Cancer , 2011, RECOMB.

[18]  Roded Sharan,et al.  Simultaneous Integration of Multi-omics Data Improves the Identification of Cancer Driver Modules. , 2019, Cell systems.

[19]  Eli Upfal,et al.  Algorithms for Detecting Significantly Mutated Pathways in Cancer , 2010, RECOMB.

[20]  Guoxian Yu,et al.  CoDP: Cooperative Driver Pathways Discovery With Matrix Factorization and Tri-Random Walk , 2019, IEEE Access.

[21]  Junfeng Xia,et al.  Comparison and integration of computational methods for deleterious synonymous mutation prediction , 2020, Briefings Bioinform..

[22]  C. Sander,et al.  Systematic identification of cancer driving signaling pathways based on mutual exclusivity of genomic alterations , 2014, Genome Biology.

[23]  Niko Beerenwinkel,et al.  TiMEx: a waiting time model for mutually exclusive cancer alterations , 2015, Bioinform..

[24]  Benjamin J. Raphael,et al.  Hierarchical HotNet: identifying hierarchies of altered subnetworks , 2018, Bioinform..

[25]  Roded Sharan,et al.  Simultaneous Identification of Multiple Driver Pathways in Cancer , 2013, PLoS Comput. Biol..

[26]  Janos X. Binder,et al.  DISEASES: Text mining and data integration of disease–gene associations , 2014, bioRxiv.

[27]  Mark D. M. Leiserson,et al.  Abstract 5324: Pan-cancer identification of mutated pathways and protein complexes , 2014 .

[28]  Francesca D. Ciccarelli,et al.  NCG 5.0: updates of a manually curated repository of cancer genes and associated properties from cancer mutational screenings , 2015, Nucleic Acids Res..

[29]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Mingming Jia,et al.  COSMIC (the Catalogue of Somatic Mutations in Cancer): a resource to investigate acquired mutations in human cancer , 2009, Nucleic Acids Res..

[31]  Shi-Hua Zhang,et al.  Efficient methods for identifying mutated driver pathways in cancer , 2012, Bioinform..

[32]  Roded Sharan,et al.  BeWith: A Between-Within method to discover relationships between cancer modules via integrated analysis of mutual exclusivity, co-occurrence and functional interactions , 2017, PLoS Comput. Biol..

[33]  Lin Gao,et al.  Discovering potential cancer driver genes by an integrated network-based approach. , 2016, Molecular bioSystems.

[34]  N. Birkbak,et al.  Cancer Genome Evolutionary Trajectories in Metastasis. , 2020, Cancer cell.

[35]  R. Sharan,et al.  Expander: from expression microarrays to networks and functions , 2010, Nature Protocols.

[36]  R Tibshirani,et al.  Impact of menstrual phase on false‐negative mammograms in the canadian national breast screening study , 1997, Cancer.

[37]  Benjamin E. Gross,et al.  The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. , 2012, Cancer discovery.

[38]  P. Dessen,et al.  An Atlas on genes and chromosomes in oncology and haematology. , 2004, Cellular and molecular biology.

[39]  Gary D Bader,et al.  Comprehensive identification of mutational cancer driver genes across 12 tumor types , 2013, Scientific Reports.

[40]  B. Carlson Next Generation Sequencing: The Next Iteration of Personalized Medicine: Next generation sequencing, along with expanding databases like The Cancer Genome Atlas, has the potential to aid rational drug discovery and streamline clinical trials. , 2012, Biotechnology healthcare.

[41]  A. Bashashati,et al.  DriverNet: uncovering the impact of somatic driver mutations on transcriptional networks in cancer , 2012, Genome Biology.

[42]  Shihua Zhang,et al.  Discovery of cancer common and specific driver gene sets , 2016, Nucleic acids research.

[43]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[44]  Di Zhang,et al.  Somatic synonymous mutations in regulatory elements contribute to the genetic aetiology of melanoma , 2020, BMC Medical Genomics.

[45]  D. Haussler,et al.  The Somatic Genomic Landscape of Glioblastoma , 2013, Cell.