MuSiC: Identifying mutational significance in cancer genomes

Massively parallel sequencing technology and the associated rapidly decreasing sequencing costs have enabled systemic analyses of somatic mutations in large cohorts of cancer cases. Here we introduce a comprehensive mutational analysis pipeline that uses standardized sequence-based inputs along with multiple types of clinical data to establish correlations among mutation sites, affected genes and pathways, and to ultimately separate the commonly abundant passenger mutations from the truly significant events. In other words, we aim to determine the Mutational Significance in Cancer (MuSiC) for these large data sets. The integration of analytical operations in the MuSiC framework is widely applicable to a broad set of tumor types and offers the benefits of automation as well as standardization. Herein, we describe the computational structure and statistical underpinnings of the MuSiC pipeline and demonstrate its performance using 316 ovarian cancer samples from the TCGA ovarian cancer project. MuSiC correctly confirms many expected results, and identifies several potentially novel avenues for discovery.

[1]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[2]  C. Carter Mendelian Inheritance in Man , 1967 .

[3]  V. McKusick Mendelian inheritance in man , 1971 .

[4]  M. King,et al.  Linkage of early-onset familial breast cancer to chromosome 17q21. , 1990, Science.

[5]  Steven E. Bayer,et al.  A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. , 1994, Science.

[6]  V. McKusick Mendelian Inheritance in Man: A Catalog of Human Genes and Genetic Disorders , 1997 .

[7]  J. Naylor,et al.  Mendelian inheritance in man: A catalog of human genes and genetic disorders , 1996 .

[8]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[9]  V. Rotter,et al.  Oncogenic mutations of the p53 tumor suppressor: the demons of the guardian of the genome. , 2000, Cancer research.

[10]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[11]  Peer Bork,et al.  SMART: a web-based tool for the study of genetically mobile domains , 2000, Nucleic Acids Res..

[12]  M. Crompton Mitochondrial intermembrane junctional complexes and their role in cell death , 2000, The Journal of physiology.

[13]  M. King,et al.  Breast and Ovarian Cancer Risks Due to Inherited Mutations in BRCA1 and BRCA2 , 2003, Science.

[14]  Lincoln Stein,et al.  Reactome: a knowledgebase of biological pathways , 2004, Nucleic Acids Res..

[15]  J. Krischer,et al.  BRCA1 and BRCA2 mutations account for a large proportion of ovarian carcinoma cases , 2005, Cancer.

[16]  A. Lash,et al.  Frequent Mutation of the PIK3CA Gene in Ovarian and Breast Cancers , 2005, Clinical Cancer Research.

[17]  Yan Zhang,et al.  CanPredict: a computational tool for predicting cancer-associated missense mutations , 2007, Nucleic Acids Res..

[18]  G. Parmigiani,et al.  A multidimensional analysis of genes mutated in breast and colorectal cancers. , 2007, Genome research.

[19]  Guy Cavet,et al.  Comment on "The Consensus Coding Sequences of Human Breast and Colorectal Cancers" , 2007, Science.

[20]  Mansoor Abdul,et al.  Ryanodine Receptor Expression Correlates with Tumor Grade in Breast Cancer , 2008, Pathology & Oncology Research.

[21]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[22]  B. Benayoun,et al.  The mutations and potential targets of the forkhead transcription factor FOXL2 , 2008, Molecular and Cellular Endocrinology.

[23]  S A Forbes,et al.  The Catalogue of Somatic Mutations in Cancer (COSMIC) , 2008, Current protocols in human genetics.

[24]  Brian H. Dunford-Shore,et al.  Somatic mutations affect key pathways in lung adenocarcinoma , 2008, Nature.

[25]  Pooja Mittal,et al.  A novel signaling pathway impact analysis , 2009, Bioinform..

[26]  Cyrus Chothia,et al.  SUPERFAMILY—sophisticated comparative genomics, data mining, visualization and phylogeny , 2008, Nucleic Acids Res..

[27]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[28]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[29]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[30]  Amos Bairoch,et al.  PROSITE, a protein domain database for functional characterization and annotation , 2009, Nucleic Acids Res..

[31]  David Haussler,et al.  Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM , 2010, Bioinform..

[32]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[33]  Carlos Caldas,et al.  Driver mutations in TP53 are ubiquitous in high grade serous carcinoma of the ovary , 2010, The Journal of pathology.

[34]  Zhiwei Wang,et al.  Forkhead box M1 transcription factor: a novel target for cancer therapy. , 2010, Cancer treatment reviews.

[35]  R. Hruban,et al.  Prioritization of driver mutations in pancreatic cancer using cancer-specific high-throughput annotation of somatic mutations (CHASM) , 2010, Cancer biology & therapy.

[36]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[37]  C. Sander,et al.  Predicting the functional impact of protein mutations: application to cancer genomics , 2011, Nucleic acids research.

[38]  Benjamin J. Raphael,et al.  Integrated Genomic Analyses of Ovarian Carcinoma , 2011, Nature.

[39]  Ling Lin,et al.  PathScan: a tool for discerning mutational significance in groups of putative cancer genes , 2011, Bioinform..

[40]  Elaine R. Mardis,et al.  A decade’s perspective on DNA sequencing technology , 2011, Nature.