Discovery and characterization of coding and non-coding driver mutations in more than 2,500 whole cancer genomes

Discovery of cancer drivers has traditionally focused on the identification of protein-coding genes. Here we present a comprehensive analysis of putative cancer driver mutations in both protein-coding and non-coding genomic regions across >2,500 whole cancer genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium. We developed a statistically rigorous strategy for combining significance levels from multiple driver discovery methods and demonstrate that the integrated results overcome limitations of individual methods. We combined this strategy with careful filtering and applied it to protein-coding genes, promoters, untranslated regions (UTRs), distal enhancers and non-coding RNAs. These analyses redefine the landscape of non-coding driver mutations in cancer genomes, confirming a few previously reported elements and raising doubts about others, while identifying novel candidate elements across 27 cancer types. Novel recurrent events were found in the promoters or 5’UTRs of TP53, RFTN1, RNF34, and MTG2, in the 3’UTRs of NFKBIZ and TOB1, and in the non-coding RNA RMRP. We provide evidence that the previously reported non-coding RNAs NEAT1 and MALAT1 may be subject to a localized mutational process. Perhaps the most striking finding is the relative paucity of point mutations driving cancer in non-coding genes and regulatory elements. Though we have limited power to discover infrequent non-coding drivers in individual cohorts, combined analysis of promoters of known cancer genes show little excess of mutations beyond TERT.

Lucas Lochovsky | Donghoon Lee | Chris Sander | Nuno A. Fonseca | Benjamin J. Raphael | Tatsuhiko Tsunoda | Jan Komorowski | Manolis Kellis | Peter J. Campbell | Nicholas A. Sinnott-Armstrong | Tobias Madsen | Qianyun Guo | Johanna Bertl | Asger Hobolth | Jakob Skou Pedersen | David Tamborero | Calvin Wing Yiu Chan | Keith A. Boroevich | Gad Getz | Julian M. Hess | Andre Kahles | Esther Rheinbay | Grace Tiao | Malene Juul | Michael S. Lawrence | Yosef E. Maruvka | Morten Muhlig Nielsen | Henrik Hornshøj | Iñigo Martincorena | Mark A. Rubin | Lars Feuerbach | Carl Herrmann | Mark B. Gerstein | Kjong-Van Lehmann | Nicholas J. Haradhvala | Jüri Reimand | Radhakrishnan Sabarinathan | Klev Diamanti | Loris Mularoni | Josh Stuart | Gordon Saksena | Priyanka Dhingra | Abel Gonzalez-Perez | Lina Sieverling | David A. Wheeler | Federico Abascal | Samirkumar B. Amin | Ekta Khurana | Shimin Shuai | Ciyue Shen | Mark P. Hamilton | Keren Isaev | Todd A. Johnson | Youngwook Kim | Sushant Kumar | Eric Minwei Liu | Oriol Pich | Husen M. Umer | Liis Uusküla-Reimand | Claes Wadelius | Lina Wadi | Lincoln D. Stein | Núria López-Bigas | Abdullah Kahraman | M. Gerstein | C. Sander | Manolis Kellis | G. Getz | A. Hobolth | M. Rubin | D. Wheeler | M. Lawrence | T. Tsunoda | Ekta Khurana | J. S. Pedersen | P. Campbell | J. Komorowski | C. Wadelius | Rory Johnson | A. Gonzalez-Perez | N. López-Bigas | D. Tamborero | J. Reimand | Jaegil Kim | G. Saksena | L. Mularoni | R. Sabarinathan | I. Martincorena | M. Hamilton | H. Hornshøj | M. Nielsen | Malene Juul | Tobias Madsen | Josh Stuart | A. Kahraman | A. Kahles | Youngwook Kim | Keunchil Park | G. Tiao | L. Lochovsky | Sushant Kumar | Jing Zhang | J. Hess | D. Chakravarty | C. Herrmann | N. Haradhvala | Esther Rheinbay | Y. Maruvka | Keren Isaev | O. Pich | F. Abascal | K. Lehmann | S. Amin | E. M. Liu | K. Diamanti | Joana Carlevaro-Fita | Andrés Lanzós | H. Umer | Donghoon Lee | L. Feuerbach | L. Sieverling | P. Dhingra | Johanna Bertl | Qianyun Guo | T. A. Johnson | Chen Hong | Ciyue Shen | Shimin Shuai | Liis Uusküla-Reimand | Lina Wadi | Keunchil Park | Rory Johnson | Jaegil Kim | Randi Istrup Pedersen | Andrés Lanzós | Jing Zhang | Joana Carlevaro-Fita | Dimple Chakravarty | Calvin W.Y. Chan | Chen Hong | Ben Raphael | Lincoln Stein | S. Shuai | Josh M Stuart | M. P. Hamilton | A. González-Pérez

[1]  J. Birchler,et al.  Reflections on studies of gene expression in aneuploids. , 2010, The Biochemical journal.

[2]  Michael A Newton,et al.  Dosage compensation can buffer copy-number variation in wild yeast , 2015, eLife.

[3]  M. Baer,et al.  Structure and transcription of a human gene for H1 RNA, the RNA component of human RNase P , 1990, Nucleic Acids Res..

[4]  Y. Zhuang,et al.  MicroRNA-25 promotes gastric cancer migration, invasion and proliferation by directly targeting transducer of ERBB2, 1 and correlates with poor survival , 2014, Oncogene.

[5]  Gary D Bader,et al.  Enhancer hijacking activates GFI1 family oncogenes in medulloblastoma , 2014, Nature.

[6]  A. S. Krasilnikov,et al.  Of proteins and RNA: the RNase P/MRP family. , 2010, RNA.

[7]  Gabor T. Marth,et al.  Integrative Annotation of Variants from 1092 Humans: Application to Cancer Genomics , 2013, Science.

[8]  Andrew J. Dunford,et al.  Targetable genetic features of primary testicular and primary central nervous system lymphomas. , 2014, Blood.

[9]  Ana Kozomara,et al.  miRBase: annotating high confidence microRNAs using deep sequencing data , 2013, Nucleic Acids Res..

[10]  Ryan D. Morin,et al.  Genetic Landscapes of Relapsed and Refractory Diffuse Large B-Cell Lymphomas , 2015, Clinical Cancer Research.

[11]  A. Bhan,et al.  Long Noncoding RNA and Cancer: A New Paradigm. , 2017, Cancer research.

[12]  G. Getz,et al.  GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers , 2011, Genome Biology.

[13]  Radhakrishnan Sabarinathan,et al.  Nucleotide excision repair is impaired by binding of transcription factors to DNA , 2015, Nature.

[14]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[15]  S. Hoshino,et al.  Anti‐proliferative protein Tob negatively regulates CPEB3 target by recruiting Caf1 deadenylase , 2011, The EMBO journal.

[16]  Keith A. Boroevich,et al.  Whole-genome mutational landscape and characterization of noncoding and structural mutations in liver cancer , 2016, Nature Genetics.

[17]  J. Yun,et al.  MiR-663, a microRNA targeting p21WAF1/CIP1, promotes the proliferation and tumorigenesis of nasopharyngeal carcinoma , 2012, Oncogene.

[18]  A. Gonzalez-Perez,et al.  OncodriveFML: a general framework to identify coding and non-coding regions with cancer driver mutations , 2016, Genome Biology.

[19]  Sean R. Eddy,et al.  Rfam 11.0: 10 years of RNA families , 2012, Nucleic Acids Res..

[20]  David L. Masica,et al.  Assessing the Pathogenicity of Insertion and Deletion Variants with the Variant Effect Scoring Tool (VEST‐Indel) , 2015, Human mutation.

[21]  Trevor J Pugh,et al.  Recurrent and functional regulatory mutations in breast cancer , 2017, Nature.

[22]  David L. Gibbs,et al.  Combining Dependent P-values with an Empirical Adaptation of Brown’s Method , 2015 .

[23]  David T. W. Jones,et al.  Signatures of mutational processes in human cancer , 2013, Nature.

[24]  Laurent Lestrade,et al.  snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs , 2005, Nucleic Acids Res..

[25]  Yi Jing,et al.  Analysis of 13 cell types reveals evidence for the expression of numerous novel primate- and tissue-specific microRNAs , 2015, Proceedings of the National Academy of Sciences.

[26]  D. Schadendorf,et al.  Highly Recurrent TERT Promoter Mutations in Human Melanoma , 2022 .

[27]  T. Javali,et al.  Dislodged dormia basket. , 2013, Urology.

[28]  Adam J. Riesselman,et al.  3D RNA and Functional Interactions from Evolutionary Couplings , 2015, Cell.

[29]  Paz Polak,et al.  Cell-of-origin chromatin organization shapes the mutational landscape of cancer , 2015, Nature.

[30]  A. Valencia,et al.  Non-coding recurrent mutations in chronic lymphocytic leukaemia , 2015, Nature.

[31]  C. Sander,et al.  Genome-wide analysis of non-coding regulatory mutations in cancer , 2014, Nature Genetics.

[32]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[33]  A. S. Krasilnikov,et al.  Conserved regions of ribonucleoprotein ribonuclease MRP are involved in interactions with its substrate , 2013, Nucleic acids research.

[34]  E. Wang,et al.  Analysis and design of RNA sequencing experiments for identifying isoform regulation , 2010, Nature Methods.

[35]  A. McKenna,et al.  Exome and whole genome sequencing of esophageal adenocarcinoma identifies recurrent driver events and mutational complexity , 2013, Nature Genetics.

[36]  A. S. Krasilnikov,et al.  Eukaryotic ribonucleases P/MRP: the crystal structure of the P3 domain , 2010, The EMBO journal.

[37]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[38]  A. Yoshimura,et al.  The B cell‐specific major raft protein, Raftlin, is necessary for the integrity of lipid raft and BCR signal transduction , 2003, The EMBO journal.

[39]  D. Schadendorf,et al.  TERT Promoter Mutations in Familial and Sporadic Melanoma , 2013, Science.

[40]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[41]  Nayun Kim,et al.  Role for topoisomerase 1 in transcription-associated mutagenesis in yeast , 2010, Proceedings of the National Academy of Sciences.

[42]  M. Snyder,et al.  Recurrent Somatic Mutations in Regulatory Regions of Human Cancer Genomes , 2015, Nature Genetics.

[43]  E. Lander,et al.  A mutational signature reveals alterations underlying deficient homologous recombination repair in breast cancer , 2017, Nature Genetics.

[44]  Xia Ying,et al.  Linc00963: a novel, long non-coding RNA involved in the transition of prostate cancer from androgen-dependence to androgen-independence. , 2014, International journal of oncology.

[45]  E. Larsson,et al.  Systematic analysis of noncoding somatic mutations and gene expression alterations across 14 tumor types , 2014, Nature Genetics.

[46]  Maite Huarte The emerging role of lncRNAs in cancer , 2015, Nature Medicine.

[47]  Ferran Reverter,et al.  Discovery of Cancer Driver Long Noncoding RNAs across 1112 Tumour Genomes: New Candidates and Distinguishing Features , 2016, Scientific Reports.

[48]  Jesse Dabney,et al.  Length and GC-biases during sequencing library amplification: a comparison of various polymerase-buffer systems with ancient and modern DNA sequencing libraries. , 2012, BioTechniques.

[49]  David L. Gibbs,et al.  Combining Dependent P-values with an Empirical Adaptation of Brown’s Method , 2015, bioRxiv.

[50]  Shawn M. Gillespie,et al.  Insulator dysfunction and oncogene activation in IDH mutant gliomas , 2015, Nature.

[51]  Icgc,et al.  Pan-cancer analysis of whole genomes , 2017, bioRxiv.

[52]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[53]  Morton B. Brown 400: A Method for Combining Non-Independent, One-Sided Tests of Significance , 1975 .

[54]  Jan Gorodkin,et al.  RNAsnp: Efficient Detection of Local RNA Secondary Structure Changes Induced by SNPs , 2013, Human mutation.

[55]  Zhiming Cai,et al.  Inducing cell proliferation inhibition, apoptosis, and motility reduction by silencing long noncoding ribonucleic acid metastasis-associated lung adenocarcinoma transcript 1 in urothelial carcinoma of the bladder. , 2013, Urology.

[56]  Niko Välimäki,et al.  CTCF/cohesin-binding sites are frequently mutated in cancer , 2015, Nature Genetics.

[57]  Raphael Kopan,et al.  The Canonical Notch Signaling Pathway: Unfolding the Activation Mechanism , 2009, Cell.

[58]  A. McKenna,et al.  Absolute quantification of somatic DNA alterations in human cancer , 2012, Nature Biotechnology.

[59]  M. Gerstein,et al.  LARVA: an integrative framework for large-scale analysis of recurrent variants in noncoding annotations , 2015, Nucleic acids research.

[60]  M. Schmitt,et al.  RNase MRP Cleaves the CLB2 mRNA To Promote Cell Cycle Progression: Novel Method of mRNA Degradation , 2004, Molecular and Cellular Biology.

[61]  Joshua F. McMichael,et al.  Whole Genome Analysis Informs Breast Cancer Response to Aromatase Inhibition , 2012, Nature.

[62]  Trevor J Pugh,et al.  Oncotator: Cancer Variant Annotation Tool , 2015, Human mutation.

[63]  Steven A. Roberts,et al.  Mutational heterogeneity in cancer and the search for new cancer-associated genes , 2013 .

[64]  Guoli Ji,et al.  detectIR: A Novel Program for Detecting Perfect and Imperfect Inverted Repeats Using Complex Numbers and Vector Calculation , 2014, PloS one.

[65]  Yakir A Reshef,et al.  Partitioning heritability by functional annotation using genome-wide association summary statistics , 2015, Nature Genetics.

[66]  Nuno A. Fonseca,et al.  Pan-cancer study of heterogeneous RNA aberrations , 2017, bioRxiv.

[67]  Marcin Imielinski,et al.  Insertions and Deletions Target Lineage-Defining Genes in Human Cancers , 2017, Cell.

[68]  K. Takeyasu,et al.  Human G-proteins, ObgH1 and Mtg1, associate with the large mitochondrial ribosome subunit and are involved in translation and assembly of respiratory complexes , 2013, Nucleic acids research.

[69]  R. Houlston,et al.  Genome-wide association studies of cancer: current insights and future perspectives , 2017, Nature Reviews Cancer.

[70]  G. Shadel,et al.  Mitochondrial DNA maintenance in vertebrates. , 1997, Annual review of biochemistry.

[71]  Gunnar Rätsch,et al.  Genomic basis for RNA alterations revealed by whole-genome analyses of 27 cancer types , 2017 .

[72]  David C. Jones,et al.  Landscape of somatic mutations in 560 breast cancer whole genome sequences , 2016, Nature.

[73]  A. S. Krasilnikov,et al.  Structural organizations of yeast RNase P and RNase MRP holoenzymes as revealed by UV-crosslinking studies of RNA-protein interactions. , 2012, RNA.

[74]  Haibo Xu,et al.  NEAT1 is Required for Survival of Breast Cancer Cells Through FUS and miR-548 , 2016, Gene regulation and systems biology.

[75]  Mingming Jia,et al.  COSMIC (the Catalogue of Somatic Mutations in Cancer): a resource to investigate acquired mutations in human cancer , 2009, Nucleic Acids Res..

[76]  Benjamin J. Raphael,et al.  Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. , 2013, The New England journal of medicine.

[77]  S. Gabriel,et al.  Discovery and saturation analysis of cancer genes across 21 tumor types , 2014, Nature.

[78]  F. Setién,et al.  Epigenetic inactivation of the p53-induced long noncoding RNA TP53 target 1 in human cancer , 2016, Proceedings of the National Academy of Sciences.

[79]  A. Bhagwat,et al.  Transcription-associated mutagenesis. , 2014, Annual review of genetics.

[80]  Robert D. Finn,et al.  Rfam 12.0: updates to the RNA families database , 2014, Nucleic Acids Res..

[81]  D A Clayton,et al.  Nuclear RNase MRP is required for correct processing of pre-5.8S rRNA in Saccharomyces cerevisiae , 1993, Molecular and cellular biology.

[82]  Gad Getz,et al.  Somatic ERCC2 Mutations Are Associated with a Distinct Genomic Signature in Urothelial Tumors , 2016, Nature Genetics.

[83]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[84]  S. Gabriel,et al.  Whole-genome sequencing reveals activation-induced cytidine deaminase signatures during indolent chronic lymphocytic leukaemia evolution , 2015, Nature Communications.

[85]  S. Gabriel,et al.  Pan-cancer patterns of somatic copy-number alteration , 2013, Nature Genetics.

[86]  S. Dhanasekaran,et al.  The landscape of long noncoding RNAs in the human transcriptome , 2015, Nature Genetics.

[87]  S. Lewis,et al.  The Nature of Mutations Induced by Replication-Transcription Collisions , 2016, Nature.

[88]  C. E. Pearson,et al.  Inverted repeats, stem‐loops, and cruciforms: Significance for initiation of DNA replication , 1996, Journal of cellular biochemistry.

[89]  C. Disteche,et al.  Dosage compensation in mammals: fine-tuning the expression of the X chromosome. , 2006, Genes & development.

[90]  Daniel S. Day,et al.  Activation of proto-oncogenes by disruption of chromosome neighborhoods , 2015, Science.

[91]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[92]  Tobias Madsen,et al.  Non-coding cancer driver candidates identified with a sample- and position-specific model of the somatic mutation rate , 2017, eLife.

[93]  A. Holland,et al.  Gene expression profiling in the adult Down syndrome brain. , 2007, Genomics.

[94]  Ian M. Fingerman,et al.  NCBI Epigenomics: a new public resource for exploring epigenomic data sets , 2010, Nucleic Acids Res..

[95]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[96]  Anushi Shah,et al.  Differential DNA repair underlies mutation hotspots at active promoters in cancer genomes , 2016, Nature.

[97]  V. Steele,et al.  New agents for cancer chemoprevention , 1996, Journal of cellular biochemistry. Supplement.

[98]  A. Sivachenko,et al.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples , 2013, Nature Biotechnology.

[99]  David G. Knowles,et al.  Fast Computation and Applications of Genome Mappability , 2012, PloS one.

[100]  Kevin Y. Yip,et al.  FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer , 2014, Genome Biology.

[101]  M. Stratton,et al.  Universal Patterns of Selection in Cancer and Somatic Tissues , 2018, Cell.

[102]  R. Fisher,et al.  Statistical Methods for Research Workers. , 1955 .

[103]  Thomas Ried,et al.  AID produces DNA double-strand breaks in non-Ig genes and mature B cell lymphomas with reciprocal chromosome translocations. , 2009, Molecular cell.

[104]  Erik Larsson,et al.  Recurrent promoter mutations in melanoma are defined by an extended context-specific mutational signature , 2017, bioRxiv.

[105]  Gouri Nanjangud,et al.  Hypermutation of multiple proto-oncogenes in B-cell diffuse large-cell lymphomas , 2001, Nature.

[106]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[107]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.