A random walk-based method to identify driver genes by integrating the subcellular localization and variation frequency into bipartite graph

BackgroundCancer as a worldwide problem is driven by genomic alterations. With the advent of high-throughput sequencing technology, a huge amount of genomic data generates at every second which offer many valuable cancer information and meanwhile throw a big challenge to those investigators. As the major characteristic of cancer is heterogeneity and most of alterations are supposed to be useless passenger mutations that make no contribution to the cancer progress. Hence, how to dig out driver genes that have effect on a selective growth advantage in tumor cells from those tremendously and noisily data is still an urgent task.ResultsConsidering previous network-based method ignoring some important biological properties of driver genes and the low reliability of gene interactive network, we proposed a random walk method named as Subdyquency that integrates the information of subcellular localization, variation frequency and its interaction with other dysregulated genes to improve the prediction accuracy of driver genes. We applied our model to three different cancers: lung, prostate and breast cancer. The results show our model can not only identify the well-known important driver genes but also prioritize the rare unknown driver genes. Besides, compared with other existing methods, our method can improve the precision, recall and fscore to a higher level for most of cancer types.ConclusionsThe final results imply that driver genes are those prone to have higher variation frequency and impact more dysregulated genes in the common significant compartment.AvailabilityThe source code can be obtained at https://github.com/weiba/Subdyquency.

[1]  Steven A. Roberts,et al.  Mutational heterogeneity in cancer and the search for new cancer-associated genes , 2013 .

[2]  Peilin Jia,et al.  VarWalker: Personalized Mutation Network Analysis of Putative Cancer Genes from Next-Generation Sequencing Data , 2014, PLoS Comput. Biol..

[3]  Nancy Hamel,et al.  Founder mutations in BRCA1/2 are not frequent in Canadian Ashkenazi Jewish men with prostate cancer , 2003, BMC Medical Genetics.

[4]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[5]  Eli Upfal,et al.  Algorithms for Detecting Significantly Mutated Pathways in Cancer , 2010, RECOMB.

[6]  Steven A. Roberts,et al.  Mutational heterogeneity in cancer and the search for new cancer genes , 2014 .

[7]  F. Supek,et al.  MUFFINN: cancer gene discovery via network analysis of somatic mutation data , 2016, Genome Biology.

[8]  Zhongming Zhao,et al.  Studying tumorigenesis through network evolution and somatic mutational perturbations in the cancer interactome. , 2014, Molecular biology and evolution.

[9]  Junfeng Xia,et al.  DriverFinder: A Gene Length-Based Network Method to Identify Cancer Driver Genes , 2017, Complex..

[10]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[11]  A. Bashashati,et al.  DriverNet: uncovering the impact of somatic driver mutations on transcriptional networks in cancer , 2012, Genome Biology.

[12]  L. Stein,et al.  A human functional protein interaction network and its application to cancer data analysis , 2010, Genome Biology.

[13]  Ao Li,et al.  Discovering potential driver genes through an integrated model of somatic mutation profiles and gene functional information. , 2017, Molecular bioSystems.

[14]  Nobuo Ochi,et al.  Loss of PTEN expression is associated with colorectal cancer liver metastasis and poor patient survival , 2008, BMC gastroenterology.

[15]  J. M. Kelley,et al.  Analysis of genetic stability at the EP300 and CREBBP loci in a panel of cancer cell lines , 2003, Genes, chromosomes & cancer.

[16]  Lit-Hsin Loo,et al.  Quantitative Protein Localization Signatures Reveal an Association between Spatial and Functional Divergences of Proteins , 2014, PLoS Comput. Biol..

[17]  Zhongming Zhao,et al.  Advances in computational approaches for prioritizing driver mutations and significantly mutated genes in cancer genomes , 2016, Briefings Bioinform..

[18]  Bin Xu,et al.  Minimal-Learning-Parameter Technique Based Adaptive Neural Sliding Mode Control of MEMS Gyroscope , 2017, Complex..

[19]  Rune Linding,et al.  Navigating cancer network attractors for tumor-specific therapy , 2012, Nature Biotechnology.

[20]  Qingxia Chen,et al.  MSEA: detection and quantification of mutation hotspots through mutation set enrichment analysis , 2014, Genome Biology.

[21]  Lin Gao,et al.  Discovering potential cancer driver genes by an integrated network-based approach. , 2016, Molecular bioSystems.

[22]  Christian Stolte,et al.  COMPARTMENTS: unification and visualization of protein subcellular localization evidence , 2014, Database J. Biol. Databases Curation.

[23]  Andrew M. Gross,et al.  Network-based stratification of tumor mutations , 2013, Nature Methods.

[24]  Matthew B. Callaway,et al.  MuSiC: Identifying mutational significance in cancer genomes , 2012, Genome research.

[25]  Junhua Zhang,et al.  The Discovery of Mutated Driver Pathways in Cancer: Models and Algorithms , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[26]  Junfeng Xia,et al.  LNDriver: identifying driver genes by integrating mutation and expression data based on gene-gene interaction network , 2016, BMC Bioinformatics.

[27]  K. Kinzler,et al.  Cancer Genome Landscapes , 2013, Science.

[28]  Hua Yu,et al.  STAT3 activation in tumor cell-free lymph nodes predicts a poor prognosis for gastric cancer. , 2014, International journal of clinical and experimental pathology.

[29]  Gabor T. Marth,et al.  Integrative Annotation of Variants from 1092 Humans: Application to Cancer Genomics , 2013, Science.

[30]  P. Miron,et al.  Germline E-cadherin mutations in familial lobular breast cancer , 2007, Journal of Medical Genetics.

[31]  Francesca D. Ciccarelli,et al.  NCG 4.0: the network of cancer genes in the era of massive mutational screenings of cancer genomes , 2014, Database J. Biol. Databases Curation.

[32]  J. P. Hou,et al.  DawnRank: discovering personalized driver genes in cancer , 2014, Genome Medicine.

[33]  Gary D Bader,et al.  International network of cancer genome projects , 2010, Nature.

[34]  Nan Qiao,et al.  CoCiter: An Efficient Tool to Infer Gene Function by Assessing the Significance of Literature Co-Citation , 2013, PloS one.

[35]  W. Hahn,et al.  Modelling the molecular circuitry of cancer , 2002, Nature Reviews Cancer.

[36]  Xuejun Yang,et al.  Predicting diabetes mellitus genes via protein-protein interaction and protein subcellular localization information , 2016, BMC Genomics.

[37]  K. Kinzler,et al.  Cancer genes and the pathways they control , 2004, Nature Medicine.

[38]  Yi Pan,et al.  An efficient method to identify essential proteins for different species by integrating protein subcellular localization information , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[39]  Rosette Lidereau,et al.  PIK3CA mutation impact on survival in breast cancer patients and in ERα, PR and ERBB2-based subgroups , 2012, Breast Cancer Research.

[40]  S. Gabriel,et al.  Discovery and saturation analysis of cancer genes across 21 tumor types , 2014, Nature.

[41]  E. Wang,et al.  Genetic studies of diseases , 2007, Cellular and Molecular Life Sciences.

[42]  Yi Pan,et al.  Rechecking the Centrality-Lethality Rule in the Scope of Protein Subcellular Localization Interaction Networks , 2015, PloS one.

[43]  Emidio Capriotti,et al.  ContrastRank: a new method for ranking putative cancer driver genes and classification of tumor samples , 2014, Bioinform..