confFuse: High-Confidence Fusion Gene Detection across Tumor Entities

Background: Fusion genes play an important role in the tumorigenesis of many cancers. Next-generation sequencing (NGS) technologies have been successfully applied in fusion gene detection for the last several years, and a number of NGS-based tools have been developed for identifying fusion genes during this period. Most fusion gene detection tools based on RNA-seq data report a large number of candidates (mostly false positives), making it hard to prioritize candidates for experimental validation and further analysis. Selection of reliable fusion genes for downstream analysis becomes very important in cancer research. We therefore developed confFuse, a scoring algorithm to reliably select high-confidence fusion genes which are likely to be biologically relevant. Results: confFuse takes multiple parameters into account in order to assign each fusion candidate a confidence score, of which score ≥8 indicates high-confidence fusion gene predictions. These parameters were manually curated based on our experience and on certain structural motifs of fusion genes. Compared with alternative tools, based on 96 published RNA-seq samples from different tumor entities, our method can significantly reduce the number of fusion candidates (301 high-confidence from 8,083 total predicted fusion genes) and keep high detection accuracy (recovery rate 85.7%). Validation of 18 novel, high-confidence fusions detected in three breast tumor samples resulted in a 100% validation rate. Conclusions: confFuse is a novel downstream filtering method that allows selection of highly reliable fusion gene candidates for further downstream analysis and experimental validations. confFuse is available at https://github.com/Zhiqin-HUANG/confFuse.

[1]  John N. Weinstein,et al.  PRADA: pipeline for RNA sequencing data analysis , 2014, Bioinform..

[2]  Orsmark Pietras,et al.  RNA-seq identifies clinically relevant fusion genes in leukemia including a novel MEF 2 D / CSF 1 R fusion responsive to imatinib , 2014 .

[3]  Süleyman Cenk Sahinalp,et al.  deFuse: An Algorithm for Gene Fusion Discovery in Tumor RNA-Seq Data , 2011, PLoS Comput. Biol..

[4]  J. Akers,et al.  RNA-seq of 272 gliomas revealed a novel, recurrent PTPRZ1-MET fusion transcript in secondary glioblastomas , 2014, Genome research.

[5]  T. Fioretos,et al.  RNA-seq identifies clinically relevant fusion genes in leukemia including a novel MEF2D/CSF1R fusion responsive to imatinib , 2014, Leukemia.

[6]  K. Gull,et al.  The Parkin co-regulated gene product, PACRG, is an evolutionarily conserved axonemal protein that functions in outer-doublet microtubule morphogenesis , 2005, Journal of Cell Science.

[7]  The Cancer Genome Atlas Research Network COMPREHENSIVE MOLECULAR CHARACTERIZATION OF CLEAR CELL RENAL CELL CARCINOMA , 2013, Nature.

[8]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[9]  V. Beneš,et al.  Integrative genomic analyses reveal an androgen-driven somatic alteration landscape in early-onset prostate cancer. , 2013, Cancer cell.

[10]  Roland Eils,et al.  circlize implements and enhances circular visualization in R , 2014, Bioinform..

[11]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[12]  Susan G. Hilsenbeck,et al.  Recurrent ESR1-CCDC170 rearrangements in an aggressive subset of estrogen-receptor positive breast cancers , 2014, Nature Communications.

[13]  Chris Wiggins,et al.  Pegasus: a comprehensive annotation and prediction tool for detection of driver gene fusions in cancer , 2014, BMC Systems Biology.

[14]  Fang Fang,et al.  FusionMap: detecting fusion genes from next-generation sequencing data at base-pair resolution , 2011, Bioinform..

[15]  B. Johansson,et al.  The emerging complexity of gene fusions in cancer , 2015, Nature Reviews Cancer.

[16]  Jun Wang,et al.  SOAPfuse: an algorithm for identifying fusion transcripts from paired-end RNA-Seq data , 2013, Genome Biology.

[17]  B. Faircloth,et al.  Primer3—new capabilities and interfaces , 2012, Nucleic acids research.

[18]  Roland Eils,et al.  Recurrent somatic alterations of FGFR1 and NTRK2 in pilocytic astrocytoma , 2013, Nature Genetics.

[19]  Hui Li,et al.  Comparative assessment of methods for the fusion transcripts detection from RNA-Seq data , 2016, Scientific Reports.

[20]  Daniel Auclair,et al.  Whole exome sequencing identifies a recurrent NAB2-STAT6 fusion in solitary fibrous tumors , 2013, Nature Genetics.

[21]  Roland Eils,et al.  Recurrent MET fusion genes represent a drug target in pediatric glioblastoma , 2016, Nature Medicine.

[22]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[23]  Gary D Bader,et al.  Molecular Classification of Ependymal Tumors across All CNS Compartments, Histopathological Grades, and Age Groups. , 2015, Cancer cell.

[24]  Melanie A. Huntley,et al.  Recurrent R-spondin fusions in colon cancer , 2012, Nature.

[25]  A. Konstantinos,et al.  A Comparative Assessment , 2003 .

[26]  Monika Heiner,et al.  The RNA-Binding Protein QKI Suppresses Cancer-Associated Aberrant Splicing , 2014, PLoS genetics.

[27]  Christopher E Mason,et al.  Identification of kinase fusion oncogenes in post-Chernobyl radiation-induced thyroid cancers. , 2013, The Journal of clinical investigation.

[28]  Steven J. M. Jones,et al.  MHC class II transactivator CIITA is a recurrent gene fusion partner in lymphoid cancers , 2011, Nature.

[29]  Marco Beccuti,et al.  State of art fusion-finder algorithms are suitable to detect transcription-induced chimeras in normal tissues? , 2013, BMC Bioinformatics.

[30]  Michael J Yaszemski,et al.  Recurrent PAX3-MAML3 fusion in biphenotypic sinonasal sarcoma , 2014, Nature Genetics.

[31]  Yoo Jin Jung,et al.  The transcriptional landscape and mutational profile of lung adenocarcinoma , 2012, Genome research.

[32]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[33]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of clear cell renal cell carcinoma , 2013, Nature.

[34]  Seungbok Lee,et al.  A transforming KIF5B and RET gene fusion in lung adenocarcinoma revealed from whole-genome and transcriptome sequencing. , 2012, Genome research.