NIMBus: a negative binomial regression based Integrative Method for mutation Burden Analysis

Background Identifying frequently mutated regions is a key approach to discover DNA elements influencing cancer progression. However, it is challenging to identify these burdened regions due to mutation rate heterogeneity across the genome and across different individuals. Moreover, it is known that this heterogeneity partially stems from genomic confounding factors, such as replication timing and chromatin organization. The increasing availability of cancer whole genome sequences and functional genomics data from the Encyclopedia of DNA Elements (ENCODE) may help address these issues. Results We developed a Negative binomial regression-based Integrative Method for mutation Burden analysiS (NIMBus). Our approach addresses the over-dispersion of mutation count statistics by (1) using a Gamma-Poisson mixture model to capture the mutation-rate heterogeneity across different individuals and (2) estimating regional background mutation rates by regressing the varying local mutation counts against genomic features extracted from ENCODE. We applied NIMBus to whole-genome cancer sequences from the PanCancer Analysis of Whole Genomes project (PCAWG) and other cohorts. It successfully identified well-known coding and noncoding drivers, such as TP53 and the TERT promoter. To further characterize the burdening of non-coding regions, we used NIMBus to screen transcription factor binding sites in promoter regions that intersect DNase I hypersensitive sites (DHSs). This analysis identified mutational hotspots that potentially disrupt gene regulatory networks in cancer. We also compare this method to other mutation burden analysis methods. Conclusion NIMBus is a powerful tool to identify mutational hotspots. The NIMBus software and results are available as an online resource at github.gersteinlab.org/nimbus.

[1]  T. Tammela,et al.  Expression and gene copy number analysis of ERBB2 oncogene in prostate cancer. , 2002, The American journal of pathology.

[2]  Steven J. M. Jones,et al.  Pan-cancer analysis of whole genomes , 2020, Nature.

[3]  A. Sivachenko,et al.  A Landscape of Driver Mutations in Melanoma , 2012, Cell.

[4]  S. Gabriel,et al.  De novo somatic mutations in components of the PI3K-AKT3-mTOR pathway cause hemimegalencephaly , 2012, Nature Genetics.

[5]  M. Beal,et al.  High aggregate burden of somatic mtDNA point mutations in aging and Alzheimer's disease brain. , 2002, Human molecular genetics.

[6]  Radhakrishnan Sabarinathan,et al.  Reduced mutation rate in exons due to differential mismatch repair , 2017, Nature Genetics.

[7]  C. Sander,et al.  Genome-wide analysis of non-coding regulatory mutations in cancer , 2014, Nature Genetics.

[8]  Shibing Deng,et al.  Whole-genome sequencing and comprehensive molecular profiling identify new driver mutations in gastric cancer , 2014, Nature Genetics.

[9]  M. Gerstein,et al.  LARVA: an integrative framework for large-scale analysis of recurrent variants in noncoding annotations , 2015, Nucleic acids research.

[10]  David C. Jones,et al.  Landscape of somatic mutations in 560 breast cancer whole genome sequences , 2016, Nature.

[11]  Beverly A. Teicher,et al.  CXCL12 (SDF-1)/CXCR4 Pathway in Cancer , 2010, Clinical Cancer Research.

[12]  Gary L. Gallia,et al.  TERT promoter mutations occur frequently in gliomas and a subset of tumors derived from cells with low rates of self-renewal , 2013, Proceedings of the National Academy of Sciences.

[13]  M. Snyder,et al.  Recurrent Somatic Mutations in Regulatory Regions of Human Cancer Genomes , 2015, Nature Genetics.

[14]  K. Ramos,et al.  The dichotomy of p53 regulation by noncoding RNAs. , 2014, Journal of molecular cell biology.

[15]  Steven A. Roberts,et al.  Mutational heterogeneity in cancer and the search for new cancer-associated genes , 2013 .

[16]  E. Birney,et al.  The topography of mutational processes in breast cancer genomes , 2016, Nature Communications.

[17]  Tomoki Yokochi,et al.  LMO3 interacts with p53 and inhibits its transcriptional activity. , 2010, Biochemical and biophysical research communications.

[18]  Gabor T. Marth,et al.  Integrative Annotation of Variants from 1092 Humans: Application to Cancer Genomics , 2013, Science.

[19]  David T. W. Jones,et al.  Signatures of mutational processes in human cancer , 2013, Nature.

[20]  M. Stratton,et al.  Universal Patterns of Selection in Cancer and Somatic Tissues , 2018, Cell.

[21]  Paz Polak,et al.  Cell-of-origin chromatin organization shapes the mutational landscape of cancer , 2015, Nature.

[22]  Ben Lehner,et al.  Differential DNA mismatch repair underlies mutation rate variation across the human genome , 2015, Nature.

[23]  M. Uhlén,et al.  CXCR4 and cancer , 2010, Pathology international.

[24]  M. Dowsett,et al.  Trastuzumab after adjuvant chemotherapy in HER2-positive breast cancer. , 2005, The New England journal of medicine.

[25]  C. L. Chiang,et al.  Introduction to stochastic processes in biostatistics. , 1968 .

[26]  A. Gonzalez-Perez,et al.  OncodriveFML: a general framework to identify coding and non-coding regions with cancer driver mutations , 2016, Genome Biology.

[27]  A. Hartmann,et al.  TERT Core Promotor Mutations in Early-Onset Bladder Cancer , 2016, Journal of Cancer.

[28]  B. Schuster-Böckler,et al.  Chromatin organization is a major influence on regional mutation rates in human cancer cells , 2012, Nature.

[29]  Manolis Kellis,et al.  Large-scale epigenome imputation improves data quality and disease variant enrichment , 2015, Nature Biotechnology.

[30]  Marcin Imielinski,et al.  Insertions and Deletions Target Lineage-Defining Genes in Human Cancers , 2017, Cell.

[31]  M. Woodbury,et al.  A variance components approach to categorical data models with heterogeneous cell populations: analysis of spatial gradients in lung cancer mortality rates in North Carolina counties. , 1981, Biometrics.

[32]  Radhakrishnan Sabarinathan,et al.  Nucleotide excision repair is impaired by binding of transcription factors to DNA , 2015, Nature.

[33]  Hong Zhao,et al.  CXCR4 in breast cancer: oncogenic role and therapeutic targeting , 2015, Drug design, development and therapy.

[34]  Daniel Gautheret,et al.  A Dual Model for Prioritizing Cancer Mutations in the Non-coding Genome Based on Germline and Somatic Events , 2015, PLoS Comput. Biol..

[35]  Trevor J Pugh,et al.  Recurrent and functional regulatory mutations in breast cancer , 2017, Nature.

[36]  Yuan-Shan Zhu,et al.  MALAT1: a potential biomarker in cancer , 2018, Cancer management and research.

[37]  Miguel Melo,et al.  Frequency of TERT promoter mutations in human cancers , 2013, Nature Communications.

[38]  Xiaoli Xie,et al.  KEGG-PATH: Kyoto encyclopedia of genes and genomes-based pathway analysis using a path analysis model. , 2014, Molecular bioSystems.

[39]  A. Børresen-Dale,et al.  TP53 mutations in human cancers: functional selection and impact on cancer prognosis and outcomes , 2007, Oncogene.

[40]  Benjamin J. Raphael,et al.  Integrated Analysis of Germline and Somatic Variants in Ovarian Cancer , 2014, Nature Communications.

[41]  M. Vijver,et al.  HER2 testing in gastric cancer: a practical approach , 2012, Modern Pathology.