Improved detection of gene fusions by applying statistical methods reveals oncogenic RNA cancer drivers

Significance Gene fusions are tumor-specific genomic aberrations and are among the most powerful biomarkers and drug targets in translational cancer biology. The advent of RNA-sequencing technologies over the last decade has provided a unique opportunity for detecting novel fusions via deploying computational algorithms on public sequencing databases. However, precise fusion detection algorithms are still out of reach. We develop Data-Enriched Efficient PrEcise STatistical fusion detection (DEEPEST), a highly specific and efficient statistical pipeline specially designed for mining massive sequencing databases and apply it to all 33 tumor types and 10,500 samples in The Cancer Genome Atlas database. We systematically profile the landscape of detected fusions via classic statistical models and identify several signatures of selection for fusions in tumors. The extent to which gene fusions function as drivers of cancer remains a critical open question. Current algorithms do not sufficiently identify false-positive fusions arising during library preparation, sequencing, and alignment. Here, we introduce Data-Enriched Efficient PrEcise STatistical fusion detection (DEEPEST), an algorithm that uses statistical modeling to minimize false-positives while increasing the sensitivity of fusion detection. In 9,946 tumor RNA-sequencing datasets from The Cancer Genome Atlas (TCGA) across 33 tumor types, DEEPEST identifies 31,007 fusions, 30% more than identified by other methods, while calling 10-fold fewer false-positive fusions in nontransformed human tissues. We leverage the increased precision of DEEPEST to discover fundamental cancer biology. Namely, 888 candidate oncogenes are identified based on overrepresentation in DEEPEST calls, and 1,078 previously unreported fusions involving long intergenic noncoding RNAs, demonstrating a previously unappreciated prevalence and potential for function. DEEPEST also reveals a high enrichment for fusions involving oncogenes in cancers, including ovarian cancer, which has had minimal treatment advances in recent decades, finding that more than 50% of tumors harbor gene fusions predicted to be oncogenic. Specific protein domains are enriched in DEEPEST calls, indicating a global selection for fusion functionality: kinase domains are nearly 2-fold more enriched in DEEPEST calls than expected by chance, as are domains involved in (anaerobic) metabolism and DNA binding. The statistical algorithms, population-level analytic framework, and the biological conclusions of DEEPEST call for increased attention to gene fusions as drivers of cancer and for future research into using fusions for targeted therapy.

[1]  C. Grandori,et al.  CDK12: an emerging therapeutic target for cancer , 2018, Journal of Clinical Pathology.

[2]  Li Ding,et al.  Driver Fusions and Their Implications in the Development and Treatment of Human Cancers , 2018, Cell reports.

[3]  J. Mendell,et al.  Functional Classification and Experimental Dissection of Long Noncoding RNAs , 2018, Cell.

[4]  Liuqing Yang,et al.  Long Noncoding RNA in Cancer: Wiring Signaling Circuitry. , 2017, Trends in cell biology.

[5]  Ming Tang,et al.  TumorFusions: an integrative resource for cancer-associated transcript fusions , 2017, Nucleic Acids Res..

[6]  A. Sethi,et al.  The Cancer Genomics Cloud: Collaborative, Reproducible, and Democratized-A New Paradigm in Large-Scale Computational Research. , 2017, Cancer research.

[7]  Russell Bonneville,et al.  Landscape of Microsatellite Instability Across 39 Cancer Types. , 2017, JCO precision oncology.

[8]  Bauke Ylstra,et al.  Postmortem Examination of an Aggressive Case of Medullary Thyroid Carcinoma Characterized by Catastrophic Genomic Abnormalities. , 2017, JCO precision oncology.

[9]  E. A. Sweet-Cordero,et al.  Statistical algorithms improve accuracy of gene fusion detection , 2017, Nucleic acids research.

[10]  Moriah H Nissan,et al.  OncoKB: A Precision Oncology Knowledge Base. , 2017, JCO precision oncology.

[11]  Timothy L. Tickle,et al.  STAR-Fusion: Fast and Accurate Fusion Transcript Detection from RNA-Seq , 2017, bioRxiv.

[12]  Elaine R. Mardis,et al.  Applications of Immunogenomics to Cancer , 2017, Cell.

[13]  C. Bonifer,et al.  The Transcriptome Heterogeneity of MLL-Fusion ALL Is Driven By Fusion Partners Via Distinct Chromatin Binding , 2016 .

[14]  Sanghyuk Lee,et al.  ChimerDB 3.0: an enhanced database for fusion genes from cancer transcriptome and literature data mining , 2016, Nucleic Acids Res..

[15]  Erik Larsson,et al.  Global analysis of somatic structural genomic alterations and their impact on gene expression in diverse human cancers , 2016, Proceedings of the National Academy of Sciences.

[16]  Rui-Xi Hua,et al.  Long Intergenic Noncoding RNA 00511 Acts as an Oncogene in Non–small-cell Lung Cancer by Binding to EZH2 and Suppressing p57 , 2016, Molecular therapy. Nucleic acids.

[17]  Jin Zhang,et al.  INTEGRATE-neo: a pipeline for personalized gene fusion neoantigen discovery , 2016, Bioinform..

[18]  A. Chak,et al.  RNA Sequencing Identifies Transcriptionally Viable Gene Fusions in Esophageal Adenocarcinomas. , 2016, Cancer research.

[19]  Nicola D. Roberts,et al.  Genomic Classification and Prognosis in Acute Myeloid Leukemia. , 2016, The New England journal of medicine.

[20]  M. Babu,et al.  Discovering and understanding oncogenic gene fusions through data intensive computational approaches , 2016, Nucleic acids research.

[21]  R. Gibbs,et al.  Genomic analyses identify molecular subtypes of pancreatic cancer , 2016, Nature.

[22]  Paul Medvedev,et al.  TwoPaCo: an efficient algorithm to build the compacted de Bruijn graph from many complete genomes , 2016, Bioinform..

[23]  Hui Li,et al.  Comparative assessment of methods for the fusion transcripts detection from RNA-Seq data , 2016, Scientific Reports.

[24]  Carl Kingsford,et al.  Fast Search of Thousands of Short-Read Sequencing Experiments , 2015, Nature Biotechnology.

[25]  Adrian V. Lee,et al.  Comprehensive evaluation of fusion transcript detection algorithms and a meta-caller to combine top performing methods in paired-end RNA-seq data , 2015, Nucleic acids research.

[26]  Maite Huarte The emerging role of lncRNAs in cancer , 2015, Nature Medicine.

[27]  V. Beral,et al.  Rethinking ovarian cancer II: reducing mortality from high-grade serous ovarian cancer , 2015, Nature Reviews Cancer.

[28]  Zhihong Wu,et al.  FusionCancer: a database of cancer fusion genes derived from RNA-seq data , 2015, Diagnostic Pathology.

[29]  L. Laurent,et al.  Statistically based splicing detection reveals neural enrichment and tissue-specific induction of circular RNA during human fetal development , 2015, Genome Biology.

[30]  Wei-De Zhong,et al.  miR-195 Inhibits Tumor Progression by Targeting RPS6KB1 in Human Prostate Cancer , 2015, Clinical Cancer Research.

[31]  M. Stratton,et al.  High burden and pervasive positive selection of somatic mutations in normal human skin , 2015, Science.

[32]  Liuqing Yang,et al.  lncRNA Directs Cooperative Epigenetic Regulation Downstream of Chemokine Signals , 2014, Cell.

[33]  R. Verhaak,et al.  The landscape and therapeutic relevance of cancer-associated transcript fusions , 2014, Oncogene.

[34]  Mingming Jia,et al.  COSMIC: exploring the world's knowledge of somatic mutations in human cancer , 2014, Nucleic Acids Res..

[35]  Hai Fang,et al.  dcGOR: An R Package for Analysing Ontologies and Protein Domain Annotations , 2014, PLoS Comput. Biol..

[36]  Nicolas Stransky,et al.  The landscape of kinase fusions in cancer , 2014, Nature Communications.

[37]  John N. Weinstein,et al.  PRADA: pipeline for RNA sequencing data analysis , 2014, Bioinform..

[38]  S. Gabriel,et al.  Discovery and saturation analysis of cancer genes across 21 tumor types , 2014, Nature.

[39]  Cory B. Giles,et al.  Systematic classification of non-coding RNAs by epigenomic similarity , 2013, BMC Bioinformatics.

[40]  S. Dhanasekaran,et al.  The long noncoding RNA SChLAP1 promotes aggressive prostate cancer and antagonizes the SWI/SNF complex , 2013, Nature Genetics.

[41]  Benjamin J. Raphael,et al.  Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. , 2013, The New England journal of medicine.

[42]  Ellen T. Gelfand,et al.  The Genotype-Tissue Expression (GTEx) project , 2013, Nature Genetics.

[43]  P. Holst,et al.  The rationale of vectored gene-fusion vaccines against cancer: evolving strategies and latest evidence , 2013, Therapeutic advances in vaccines.

[44]  Marco Beccuti,et al.  State of art fusion-finder algorithms are suitable to detect transcription-induced chimeras in normal tissues? , 2013, BMC Bioinformatics.

[45]  A. Sivachenko,et al.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples , 2013, Nature Biotechnology.

[46]  Stephen P. Jackson,et al.  Chromothripsis and cancer: causes and consequences of chromosome shattering , 2012, Nature Reviews Cancer.

[47]  D. Brat,et al.  Transforming Fusions of FGFR and TACC Genes in Human Glioblastoma , 2012, Science.

[48]  Enrico Macii,et al.  Bellerophontes: an RNA-Seq data analysis framework for chimeric transcripts discovery based on accurate fusion model , 2012, Bioinform..

[49]  Benjamin E. Gross,et al.  The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. , 2012, Cancer discovery.

[50]  Charles Gawad,et al.  Circular RNAs Are the Predominant Transcript Isoform from Hundreds of Human Genes in Diverse Cell Types , 2012, PloS one.

[51]  S. Lessnick,et al.  Promiscuous partnerships in Ewing's sarcoma. , 2011, Cancer genetics.

[52]  J. Bergh,et al.  Transcriptional consequences of genomic structural aberrations in breast cancer. , 2011, Genome research.

[53]  T. Halazonetis,et al.  Genomic instability — an evolving hallmark of cancer , 2010, Nature Reviews Molecular Cell Biology.

[54]  S. Lessnick,et al.  GSTM4 is a microsatellite-containing EWS/FLI target involved in Ewing's sarcoma oncogenesis and therapeutic resistance , 2009, Oncogene.

[55]  J. Brooks Role of the TMPRSS2-ERG gene fusion in prostate cancer , 2008 .

[56]  T. Tammela,et al.  TMPRSS2:ERG Fusion Identifies a Subgroup of Prostate Cancers with a Favorable Prognosis , 2008, Clinical Cancer Research.

[57]  R. Shah,et al.  Role of the TMPRSS2-ERG gene fusion in prostate cancer. , 2008, Neoplasia.

[58]  Wen-Lin Kuo,et al.  Amplification of PVT1 Contributes to the Pathophysiology of Ovarian and Breast Cancer , 2007, Clinical Cancer Research.

[59]  H. Aburatani,et al.  Identification of the transforming EML4–ALK fusion gene in non-small-cell lung cancer , 2007, Nature.

[60]  R. Redon,et al.  Relative Impact of Nucleotide and Copy Number Variation on Gene Expression Phenotypes , 2007, Science.

[61]  B. Oh,et al.  Autophagic and tumour suppressor activity of a novel Beclin1-binding protein UVRAG , 2006, Nature Cell Biology.

[62]  M. Shipitsin,et al.  Activation of RalA is critical for Ras-induced tumorigenesis of human cells. , 2005, Cancer cell.

[63]  J. Thacker The RAD51 gene family, genetic instability and cancer. , 2005, Cancer letters.

[64]  T. Hunter,et al.  The Protein Kinase Complement of the Human Genome , 2002, Science.

[65]  J. Rowley,et al.  Chromosome translocations: dangerous liaisons revisited , 2001, Nature Reviews Cancer.

[66]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[67]  I. Lax,et al.  Critical role for the docking-protein FRS2α in FGF receptor-mediated signal transduction pathways , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[68]  N. Henze A poisson limit law for a generalized birthday problem , 1998 .

[69]  Stephen N. Jones,et al.  Regulation of p53 stability by Mdm2 , 1997, Nature.

[70]  K. Umesono,et al.  Chromosomal translocation t(15;17) in human acute promyelocytic leukemia fuses RARα with a novel putative transcription factor, PML , 1991, Cell.

[71]  A. Knudson Mutation and cancer: statistical study of retinoblastoma. , 1971, Proceedings of the National Academy of Sciences of the United States of America.

[72]  The National Academy of Sciences , 1928, Science.

[73]  P. Nowell,et al.  A minute chromosome in human chronic granulocytic leukemia , 1960 .