Comparative assessment of methods for the fusion transcripts detection from RNA-Seq data

RNA-Seq made possible the global identification of fusion transcripts, i.e. “chimeric RNAs”. Even though various software packages have been developed to serve this purpose, they behave differently in different datasets provided by different developers. It is important for both users, and developers to have an unbiased assessment of the performance of existing fusion detection tools. Toward this goal, we compared the performance of 12 well-known fusion detection software packages. We evaluated the sensitivity, false discovery rate, computing time, and memory usage of these tools in four different datasets (positive, negative, mixed, and test). We conclude that some tools are better than others in terms of sensitivity, positive prediction value, time consumption and memory usage. We also observed small overlaps of the fusions detected by different tools in the real dataset (test dataset). This could be due to false discoveries by various tools, but could also be due to the reason that none of the tools are inclusive. We have found that the performance of the tools depends on the quality, read length, and number of reads of the RNA-Seq data. We recommend that users choose the proper tools for their purpose based on the properties of their RNA-Seq data.

[1]  Mukesh Jain,et al.  NGS QC Toolkit: A Toolkit for Quality Control of Next Generation Sequencing Data , 2012, PloS one.

[2]  Martin Vingron,et al.  Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels , 2012, Bioinform..

[3]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[4]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[5]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[6]  O. Kallioniemi,et al.  FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data , 2014, bioRxiv.

[7]  Christopher A. Maher,et al.  ChimeraScan: a tool for identifying chimeric transcription in sequencing data , 2011, Bioinform..

[8]  Krishna R. Kalari,et al.  A novel bioinformatics pipeline for identification and characterization of fusion transcripts in breast cancer and normal cell lines , 2011, Nucleic acids research.

[9]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[10]  S. Donatelli,et al.  State-of-the-Art Fusion-Finder Algorithms Sensitivity and Specificity , 2013, BioMed research international.

[11]  S. Redaelli,et al.  FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery , 2012, Nucleic acids research.

[12]  Krishna R. Kalari,et al.  Detection of redundant fusion transcripts as biomarkers or disease-specific therapeutic targets in breast cancer. , 2012, Cancer research.

[13]  J. Schalken,et al.  Molecular diagnosis of prostate cancer: PCA3 and TMPRSS2:ERG gene fusion. , 2012, The Journal of urology.

[14]  P. Marynen,et al.  The NPM-ALK and the ATIC-ALK fusion genes can be detected in non-neoplastic cells. , 2001, The American journal of pathology.

[15]  Brian P. Brunk,et al.  Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM) , 2011, Bioinform..

[16]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[17]  Hui Li,et al.  Chimeric RNAs generated by intergenic splicing in normal and cancer cells , 2014, Genes, chromosomes & cancer.

[18]  Ching-Lai Hwang,et al.  A new approach for multiple objective decision making , 1993, Comput. Oper. Res..

[19]  Enrico Macii,et al.  Bellerophontes: an RNA-Seq data analysis framework for chimeric transcripts discovery based on accurate fusion model , 2012, Bioinform..

[20]  Jun Wang,et al.  SOAPfuse: an algorithm for identifying fusion transcripts from paired-end RNA-Seq data , 2013, Genome Biology.

[21]  Tyson A. Clark,et al.  Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing , 2015, Nucleic acids research.

[22]  Fang Fang,et al.  FusionMap: detecting fusion genes from next-generation sequencing data at base-pair resolution , 2011, Bioinform..

[23]  Derek Y. Chiang,et al.  MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery , 2010, Nucleic acids research.

[24]  Doron Lipson,et al.  Identification of new ALK and RET gene fusions from colorectal and lung cancer biopsies , 2012, Nature Medicine.

[25]  Lee T. Sam,et al.  Transcriptome Sequencing to Detect Gene Fusions in Cancer , 2009, Nature.

[26]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[27]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[28]  Süleyman Cenk Sahinalp,et al.  deFuse: An Algorithm for Gene Fusion Discovery in Tumor RNA-Seq Data , 2011, PLoS Comput. Biol..

[29]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[30]  Denise Anderson,et al.  FusionFinder: A Software Tool to Identify Expressed Gene Fusion Candidates from RNA-Seq Data , 2012, PloS one.

[31]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[32]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[33]  Nallasivam Palanisamy,et al.  Recurrent reciprocal RNA chimera involving YPEL5 and PPP1CB in chronic lymphocytic leukemia , 2013, Proceedings of the National Academy of Sciences.

[34]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[35]  Faraz Hach,et al.  Comrad: detection of expressed rearrangements by integrated analysis of RNA-Seq and low coverage genome sequence data , 2011, Bioinform..

[36]  Marco Beccuti,et al.  State of art fusion-finder algorithms are suitable to detect transcription-induced chimeras in normal tissues? , 2013, BMC Bioinformatics.

[37]  G. Weinstock,et al.  TIGRA: A targeted iterative graph routing assembler for breakpoint assembly , 2014, Genome research.

[38]  S. C. Sahinalp,et al.  nFuse: Discovery of complex genomic rearrangements in cancer using high-throughput sequencing , 2012, Genome research.

[39]  Alberto Magi,et al.  Discovering chimeric transcripts in paired-end RNA-seq data by using EricScript , 2012, Bioinform..

[40]  Jian Ma,et al.  FusionHunter: identifying fusion transcripts in cancer using paired-end RNA-seq , 2011, Bioinform..

[41]  Hui Li,et al.  Discovery of CTCF-Sensitive Cis-Spliced Fusion RNAs between Adjacent Genes in Human Prostate Cells , 2015, PLoS genetics.

[42]  Steven J. M. Jones,et al.  BreakFusion: targeted assembly-based identification of gene fusions in whole transcriptome paired-end sequencing data , 2012, Bioinform..

[43]  A. Oshlack,et al.  JAFFA: High sensitivity transcriptome-focused fusion gene detection , 2015, bioRxiv.

[44]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.