Maftools: efficient and comprehensive analysis of somatic variants in cancer

Numerous large-scale genomic studies of matched tumor-normal samples have established the somatic landscapes of most cancer types. However, the downstream analysis of data from somatic mutations entails a number of computational and statistical approaches, requiring usage of independent software and numerous tools. Here, we describe an R Bioconductor package, Maftools, which offers a multitude of analysis and visualization modules that are commonly used in cancer genomic studies, including driver gene identification, pathway, signature, enrichment, and association analyses. Maftools only requires somatic variants in Mutation Annotation Format (MAF) and is independent of larger alignment files. With the implementation of well-established statistical and computational methods, Maftools facilitates data-driven research and comparative analysis to discover novel results from publicly available data sets. In the present study, using three of the well-annotated cohorts from The Cancer Genome Atlas (TCGA), we describe the application of Maftools to reproduce known results. More importantly, we show that Maftools can also be used to uncover novel findings through integrative analysis.

[1]  Simon Tavaré,et al.  Mutational signatures in esophageal adenocarcinoma define etiologically distinct subgroups with therapeutic relevance , 2016, Nature Genetics.

[2]  Benjamin J. Raphael,et al.  Expanding the computational toolbox for mining cancer genomes , 2014, Nature Reviews Genetics.

[3]  R. Wilson,et al.  Cancer genome sequencing: a review. , 2009, Human molecular genetics.

[4]  Reuben S Harris,et al.  RNA editing enzyme APOBEC1 and some of its homologs can act as DNA mutators. , 2002, Molecular cell.

[5]  Benjamin J. Raphael,et al.  Integrated genomic characterization of oesophageal carcinoma , 2017, Nature.

[6]  Pablo Tamayo,et al.  Kataegis Expression Signature in Breast Cancer Is Associated with Late Onset, Better Prognosis, and Higher HER2 Levels. , 2016, Cell reports.

[7]  Benjamin J. Raphael,et al.  CoMEt: a statistical approach to identify combinations of mutually exclusive alterations in cancer , 2015, Genome Biology.

[8]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of human colon and rectal cancer , 2012, Nature.

[9]  R. Schönherr Clinical Relevance of Ion Channels for Diagnosis and Therapy of Cancer , 2005, The Journal of Membrane Biology.

[10]  A. McKenna,et al.  Exome and whole genome sequencing of esophageal adenocarcinoma identifies recurrent driver events and mutational complexity , 2013, Nature Genetics.

[11]  Linghua Wang,et al.  From human genome to cancer genome: The first decade , 2013, Genome research.

[12]  X. Estivill,et al.  Signatures of positive selection reveal a universal role of chromatin modifiers as cancer driver genes , 2017, Scientific Reports.

[13]  Steven A. Roberts,et al.  Mutational heterogeneity in cancer and the search for new cancer genes , 2014 .

[14]  N. Lee,et al.  Voltage-gated Na+ Channel Activity Increases Colon Cancer Transcriptional Activity and Invasion Via Persistent MAPK Signaling , 2015, Scientific Reports.

[15]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[16]  S. Gabriel,et al.  Discovery and saturation analysis of cancer genes across 21 tumor types , 2014, Nature.

[17]  C. Yeang,et al.  Combinatorial patterns of somatic gene mutations in cancer , 2008, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[18]  Robert Gentleman,et al.  Software for Computing and Annotating Genomic Ranges , 2013, PLoS Comput. Biol..

[19]  B. Parsons MED1: A central molecule for maintenance of genome integrity and response to DNA damage , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[20]  L. Myeroff,et al.  Mutational inactivation of TGFBR2 in microsatellite unstable colon cancer arises from the cooperation of genomic instability and the clonal outgrowth of transforming growth factor β resistant cells , 2008, Genes, chromosomes & cancer.

[21]  Y. Hayashi,et al.  AML1/RUNX1 mutations are infrequent, but related to AML‐M0, acquired trisomy 21, and leukemic transformation in pediatric hematologic malignancies , 2003, Genes, chromosomes & cancer.

[22]  Benjamin J. Raphael,et al.  De novo discovery of mutated driver pathways in cancer , 2011 .

[23]  Xin Xu,et al.  Spatial intratumor heterogeneity of genetic, epigenetic alterations and temporal clonal evolution in esophageal squamous cell carcinoma , 2016, Nature Genetics.

[24]  Serena Nik-Zainal,et al.  Mechanisms underlying mutational signatures in human cancers , 2014, Nature Reviews Genetics.

[25]  M. Stratton,et al.  Deciphering Signatures of Mutational Processes Operative in Human Cancer , 2013, Cell reports.

[26]  Matthew B. Callaway,et al.  MuSiC: Identifying mutational significance in cancer genomes , 2012, Genome research.

[27]  David Tamborero,et al.  OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes , 2013, Bioinform..

[28]  Benjamin E. Gross,et al.  The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. , 2012, Cancer discovery.

[29]  Li Shang,et al.  Genomic and molecular characterization of esophageal squamous cell carcinoma , 2014, Nature Genetics.

[30]  Steven A. Roberts,et al.  An APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancers , 2013, Nature Genetics.

[31]  Gad Getz,et al.  Somatic ERCC2 Mutations Are Associated with a Distinct Genomic Signature in Urothelial Tumors , 2016, Nature Genetics.

[32]  I. Rogozin,et al.  AID/APOBEC cytosine deaminase induces genome-wide kataegis , 2012, Biology Direct.

[33]  C. Swanton,et al.  APOBEC Enzymes: Mutagenic Fuel for Cancer Evolution and Heterogeneity. , 2015, Cancer discovery.

[34]  M. Cadene,et al.  X-ray structure of a voltage-dependent K+ channel , 2003, Nature.

[35]  M. Stratton,et al.  DNA deaminases induce break-associated mutation showers with implication of APOBEC3B and 3A in breast cancer kataegis , 2013, eLife.

[36]  Benjamin J. Raphael,et al.  Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. , 2013, The New England journal of medicine.

[37]  Li Ding,et al.  Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines. , 2018, Cell systems.

[38]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[39]  K. Kinzler,et al.  Cancer Genome Landscapes , 2013, Science.

[40]  William A Catterall,et al.  Voltage‐gated sodium channels at 60: structure, function and pathophysiology , 2012, The Journal of physiology.

[41]  M. Werner,et al.  ErbB targeting inhibitors repress cell migration of esophageal squamous cell carcinoma and adenocarcinoma cells by distinct signaling pathways , 2014, Journal of Molecular Medicine.

[42]  Ling Lin,et al.  PathScan: a tool for discerning mutational significance in groups of putative cancer genes , 2011, Bioinform..

[43]  Renaud Gaujoux,et al.  A flexible R package for nonnegative matrix factorization , 2010, BMC Bioinformatics.

[44]  Roland Eils,et al.  Complex heatmaps reveal patterns and correlations in multidimensional genomic data , 2016, Bioinform..

[45]  A. Gonzalez-Perez,et al.  Functional impact bias reveals cancer drivers , 2012, Nucleic acids research.

[46]  A. Børresen-Dale,et al.  Mutational Processes Molding the Genomes of 21 Breast Cancers , 2012, Cell.

[47]  G. Getz,et al.  GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers , 2011, Genome Biology.

[48]  A. Ashworth,et al.  Therapeutic Targeting of the DNA Mismatch Repair Pathway , 2010, Clinical Cancer Research.

[49]  Ming-Rong Wang,et al.  Genomic and Epigenomic Aberrations in Esophageal Squamous Cell Carcinoma and Implications for Patients. , 2017, Gastroenterology.

[50]  R. Sikes,et al.  Voltage-sensitive ion channels and cancer , 2006, Cancer and Metastasis Reviews.

[51]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[52]  B. Taylor,et al.  deconstructSigs: delineating mutational processes in single tumors distinguishes DNA repair deficiencies and patterns of carcinoma evolution , 2016, Genome Biology.

[53]  V. Velculescu,et al.  Sodium ion channel mutations in glioblastoma patients correlate with shorter survival , 2011, Molecular Cancer.

[54]  B. Berman,et al.  Identification of distinct mutational patterns and new driver genes in oesophageal squamous cell carcinomas and adenocarcinomas , 2017, Gut.

[55]  David T. W. Jones,et al.  Signatures of mutational processes in human cancer , 2013, Nature.

[56]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[57]  P. Campbell,et al.  EMu: probabilistic inference of mutational processes and their localization in the cancer genome , 2013, Genome Biology.

[58]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[59]  Yong-mei Zhu,et al.  Exome sequencing identifies somatic mutations of DNA methyltransferase gene DNMT3A in acute monocytic leukemia , 2011, Nature Genetics.

[60]  Idris A. Eckley,et al.  changepoint: An R Package for Changepoint Analysis , 2014 .