MADGiC: a model-based approach for identifying driver genes in cancer

Motivation: Identifying and prioritizing somatic mutations is an important and challenging area of cancer research that can provide new insights into gene function as well as new targets for drug development. Most methods for prioritizing mutations rely primarily on frequency-based criteria, where a gene is identified as having a driver mutation if it is altered in significantly more samples than expected according to a background model. Although useful, frequency-based methods are limited in that all mutations are treated equally. It is well known, however, that some mutations have no functional consequence, while others may have a major deleterious impact. The spatial pattern of mutations within a gene provides further insight into their functional consequence. Properly accounting for these factors improves both the power and accuracy of inference. Also important is an accurate background model. Results: Here, we develop a Model-based Approach for identifying Driver Genes in Cancer (termed MADGiC) that incorporates both frequency and functional impact criteria and accommodates a number of factors to improve the background model. Simulation studies demonstrate advantages of the approach, including a substantial increase in power over competing methods. Further advantages are illustrated in an analysis of ovarian and lung cancer data from The Cancer Genome Atlas (TCGA) project. Availability and implementation: R code to implement this method is available at http://www.biostat.wisc.edu/ kendzior/MADGiC/. Contact: kendzior@biostat.wisc.edu Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  David Haussler,et al.  Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM , 2010, Bioinform..

[2]  Mingming Jia,et al.  COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer , 2010, Nucleic Acids Res..

[3]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[4]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Brian H. Dunford-Shore,et al.  Somatic mutations affect key pathways in lung adenocarcinoma , 2008, Nature.

[6]  E. Birney,et al.  A small cell lung cancer genome reports complex tobacco exposure signatures , 2009, Nature.

[7]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[8]  Yehudit Hasin,et al.  High-Resolution Copy-Number Variation Map Reflects Human Olfactory Receptor Diversity and Evolution , 2008, PLoS genetics.

[9]  T. Hubbard,et al.  A census of human cancer genes , 2004, Nature Reviews Cancer.

[10]  Richard Simon,et al.  Identifying cancer driver genes in tumor genome sequencing studies , 2011, Bioinform..

[11]  H. Ohtsuki,et al.  Accumulation of driver and passenger mutations during tumor progression , 2009, Proceedings of the National Academy of Sciences.

[12]  A. Sparks,et al.  The Genomic Landscapes of Human Breast and Colorectal Cancers , 2007, Science.

[13]  David Tamborero,et al.  OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes , 2013, Bioinform..

[14]  Eli Upfal,et al.  De Novo Discovery of Mutated Driver Pathways in Cancer , 2011, RECOMB.

[15]  Matthew B. Callaway,et al.  MuSiC: Identifying mutational significance in cancer genomes , 2012, Genome research.

[16]  K. Kinzler,et al.  Cancer genes and the pathways they control , 2004, Nature Medicine.

[17]  C. Sander,et al.  Mutual exclusivity analysis identifies oncogenic network modules. , 2012, Genome research.

[18]  E. Boerwinkle,et al.  dbNSFP: A Lightweight Database of Human Nonsynonymous SNPs and Their Functional Predictions , 2011, Human mutation.

[19]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[20]  Steven A. Roberts,et al.  Mutational heterogeneity in cancer and the search for new cancer-associated genes , 2013 .

[21]  K. Kinzler,et al.  Cancer Genome Landscapes , 2013, Science.

[22]  Shawn E. Yost,et al.  Mutascope: sensitive detection of somatic mutations from deep amplicon sequencing , 2013, Bioinform..

[23]  Laurent Farinelli,et al.  Impact of replication timing on non-CpG and CpG substitution rates in mammalian genomes. , 2010, Genome research.

[24]  Trevor J Pugh,et al.  Initial genome sequencing and analysis of multiple myeloma , 2011, Nature.

[25]  Bert Vogelstein,et al.  Gatekeepers and caretakers , 1997, Nature.

[26]  A. Gonzalez-Perez,et al.  Functional impact bias reveals cancer drivers , 2012, Nucleic acids research.

[27]  S. Henikoff,et al.  Predicting deleterious amino acid substitutions. , 2001, Genome research.

[28]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[29]  G. Parmigiani,et al.  The Consensus Coding Sequences of Human Breast and Colorectal Cancers , 2006, Science.

[30]  Wen-Hsiung Li,et al.  DNA replication timing and selection shape the landscape of nucleotide variation in cancer genomes , 2012, Nature Communications.

[31]  C. Sander,et al.  Predicting the functional impact of protein mutations: application to cancer genomics , 2011, Nucleic acids research.

[32]  Paz Polak,et al.  Differential relationship of DNA replication timing to different forms of human mutation and variation. , 2012, American journal of human genetics.

[33]  Zoltan Szallasi,et al.  Tumor Mutation Burden Forecasts Outcome in Ovarian Cancer with BRCA1 or BRCA2 Mutations , 2013, PloS one.

[34]  G A Colditz,et al.  Comparison of aspects of smoking among the four histological types of lung cancer , 2008, Tobacco Control.