FusionMap: detecting fusion genes from next-generation sequencing data at base-pair resolution

MOTIVATION Next generation sequencing technology generates high-throughput data, which allows us to detect fusion genes at both transcript and genomic levels. To detect fusion genes, the current bioinformatics tools heavily rely on paired-end approaches and overlook the importance of reads that span fusion junctions. Thus there is a need to develop an efficient aligner to detect fusion events by accurate mapping of these junction-spanning single reads, particularly when the read gets longer with the improvement in sequencing technology. RESULTS We present a novel method, FusionMap, which aligns fusion reads directly to the genome without prior knowledge of potential fusion regions. FusionMap can detect fusion events in both single- and paired-end datasets from either RNA-Seq or gDNA-Seq studies and characterize fusion junctions at base-pair resolution. We showed that FusionMap achieved high sensitivity and specificity in fusion detection on two simulated RNA-Seq datasets, which contained 75 nt paired-end reads. FusionMap achieved substantially higher sensitivity and specificity than the paired-end approach when the inner distance between read pairs was small. Using FusionMap to characterize fusion genes in K562 chronic myeloid leukemia cell line, we further demonstrated its accuracy in fusion detection in both single-end RNA-Seq and gDNA-Seq datasets. These combined results show that FusionMap provides an accurate and systematic solution to detecting fusion events through junction-spanning reads. AVAILABILITY FusionMap includes reference indexing, read filtering, fusion alignment and reporting in one package. The software is free for noncommercial use at (http://www.omicsoft.com/fusionmap).

[1]  S. Luo,et al.  Chimeric transcript discovery by paired-end transcriptome sequencing , 2009, Proceedings of the National Academy of Sciences.

[2]  J. Maguire,et al.  Integrative analysis of the melanoma transcriptome. , 2010, Genome research.

[3]  Ting Wang,et al.  ENCODE whole-genome data in the UCSC Genome Browser , 2009, Nucleic Acids Res..

[4]  Derek Y. Chiang,et al.  MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery , 2010, Nucleic acids research.

[5]  O. Witte,et al.  Tyrosine kinase activity and transformation potency of bcr-abl oncogene products. , 1990, Science.

[6]  David Z. Chen,et al.  METHOD Open Access , 2014 .

[7]  D Pinkel,et al.  Detection of bcr-abl fusion in chronic myelogeneous leukemia by in situ hybridization , 1990, Science.

[8]  W. Kuo,et al.  High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays , 1998, Nature Genetics.

[9]  T. Fennell,et al.  Targeted next-generation sequencing of a cancer transcriptome enhances detection of sequence variants and novel fusion transcripts , 2009, Genome Biology.

[10]  T. Rabbitts,et al.  Commonality but Diversity in Cancer Gene Fusions , 2009, Cell.

[11]  Y. Xing,et al.  Detection of splice junctions from paired-end RNA-seq data by SpliceMap , 2010, Nucleic acids research.

[12]  J. Tchinda,et al.  Recurrent Fusion of TMPRSS2 and ETS Transcription Factor Genes in Prostate Cancer , 2005, Science.

[13]  Christopher A. Miller,et al.  A sequence-level map of chromosomal breakpoints in the MCF-7 breast cancer cell line yields insights into the evolution of a cancer genome. , 2009, Genome research.

[14]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[15]  Serban Nacu,et al.  Fast and SNP-tolerant detection of complex variants and splicing in short reads , 2010, Bioinform..

[16]  Yoshiyuki Shibata,et al.  Detection of DNA fusion junctions for BCR-ABL translocations by Anchored ChromPET , 2010, Genome Medicine.

[17]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[18]  Jorge Cortes,et al.  Molecular biology of bcr-abl1-positive chronic myeloid leukemia. , 2009, Blood.

[19]  References , 1971 .

[20]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[21]  P. Edwards Fusion genes and chromosome translocations in the common epithelial cancers , 2009, The Journal of pathology.

[22]  Lee T. Sam,et al.  Transcriptome Sequencing to Detect Gene Fusions in Cancer , 2009, Nature.

[23]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[24]  Damian Smedley,et al.  BioMart – biological queries made easy , 2009, BMC Genomics.

[25]  David Haussler,et al.  ENCODE whole-genome data in the UCSC genome browser (2011 update) , 2010, Nucleic Acids Res..

[26]  Antony V. Cox,et al.  Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing , 2008, Nature Genetics.

[27]  J. Tchinda,et al.  Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. , 2006, Science.