Reproducible, Scalable Fusion Gene Detection from RNA-Seq.

Chromosomal rearrangements resulting in the creation of novel gene products, termed fusion genes, have been identified as driving events in the development of multiple types of cancer. As these gene products typically do not exist in normal cells, they represent valuable prognostic and therapeutic targets. Advances in next-generation sequencing and computational approaches have greatly improved our ability to detect and identify fusion genes. Nevertheless, these approaches require significant computational resources. Here we describe an approach which leverages cloud computing technologies to perform fusion gene detection from RNA sequencing data at any scale. We additionally highlight methods to enhance reproducibility of bioinformatics analyses which may be applied to any next-generation sequencing experiment.

[1]  Qingguo Wang,et al.  Application of next generation sequencing to human gene fusion detection: computational tools, features and perspectives , 2013, Briefings Bioinform..

[2]  J. Tchinda,et al.  Recurrent Fusion of TMPRSS2 and ETS Transcription Factor Genes in Prostate Cancer , 2005, Science.

[3]  Jim Groom,et al.  Docker - Build, Ship, and Run Any App, Anywhere , 2014 .

[4]  Vineet Bafna,et al.  Sensitive gene fusion detection using ambiguously mapping RNA-Seq read pairs , 2011, Bioinform..

[5]  Faraz Hach,et al.  Comrad: detection of expressed rearrangements by integrated analysis of RNA-Seq and low coverage genome sequence data , 2011, Bioinform..

[6]  Steven J. M. Jones,et al.  BreakFusion: targeted assembly-based identification of gene fusions in whole transcriptome paired-end sequencing data , 2012, Bioinform..

[7]  B. Johansson,et al.  Fusion genes and rearranged genes as a linear function of chromosome aberrations in cancer , 2004, Nature Genetics.

[8]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[9]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[10]  Mikhail Shugay,et al.  Oncofuse: a computational framework for the prediction of the oncogenic potential of gene fusions , 2013, Bioinform..

[11]  I. Petersen,et al.  Identification of novel fusion genes in lung cancer using breakpoint assembly of transcriptome sequencing data , 2015, Genome Biology.

[12]  Melissa J. Landrum,et al.  RefSeq: an update on mammalian reference sequences , 2013, Nucleic Acids Res..

[13]  A. Oshlack,et al.  JAFFA: High sensitivity transcriptome-focused fusion gene detection , 2015, Genome Medicine.

[14]  Süleyman Cenk Sahinalp,et al.  deFuse: An Algorithm for Gene Fusion Discovery in Tumor RNA-Seq Data , 2011, PLoS Comput. Biol..

[15]  Marco Beccuti,et al.  Chimera: a Bioconductor package for secondary analysis of fusion products , 2014, Bioinform..

[16]  J. Stephenson,et al.  Philadelphia chromosomal breakpoints are clustered within a limited region, bcr, on chromosome 22 , 1984, Cell.

[17]  O. Kallioniemi,et al.  FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data , 2014, bioRxiv.

[18]  S. Redaelli,et al.  FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery , 2012, Nucleic acids research.

[19]  P. Nowell,et al.  A minute chromosome in human chronic granulocytic leukemia , 1960 .

[20]  Xiaobo Zhou,et al.  FusionQ: a novel approach for gene fusion detection and quantification from paired-end RNA-Seq , 2013, BMC Bioinformatics.

[21]  David Z. Chen,et al.  METHOD Open Access , 2014 .

[22]  Jian Ma,et al.  FusionHunter: identifying fusion transcripts in cancer using paired-end RNA-seq , 2011, Bioinform..

[23]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[24]  S. C. Sahinalp,et al.  nFuse: Discovery of complex genomic rearrangements in cancer using high-throughput sequencing , 2012, Genome research.

[25]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[26]  J. Harrow,et al.  Systematic evaluation of spliced alignment programs for RNA-seq data , 2013, Nature Methods.

[27]  Zhong Wang,et al.  Next-generation transcriptome assembly , 2011, Nature Reviews Genetics.

[28]  Krishna R. Kalari,et al.  A novel bioinformatics pipeline for identification and characterization of fusion transcripts in breast cancer and normal cell lines , 2011, Nucleic acids research.

[29]  Siu-Ming Yiu,et al.  SOAPfusion: a robust and effective computational fusion discovery tool for RNA-seq reads , 2013, Bioinform..

[30]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[31]  F. Mitelman A Short History of Chromosome Rearrangements and Gene Fusions in Cancer , 2015 .

[32]  P. Aplan,et al.  Causes of oncogenic chromosomal translocation. , 2006, Trends in genetics : TIG.

[33]  B. Johansson,et al.  The impact of translocations and gene fusions on cancer causation , 2007, Nature Reviews Cancer.

[34]  Jun Wang,et al.  SOAPfuse: an algorithm for identifying fusion transcripts from paired-end RNA-Seq data , 2013, Genome Biology.

[35]  Enrico Macii,et al.  Bellerophontes: an RNA-Seq data analysis framework for chimeric transcripts discovery based on accurate fusion model , 2012, Bioinform..

[36]  Ting-Fung Chan,et al.  ViralFusionSeq: accurately discover viral integration events and reconstruct fusion transcripts at single-base resolution , 2013, Bioinform..

[37]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[38]  S. Salzberg,et al.  TopHat-Fusion: an algorithm for discovery of novel fusion transcripts , 2011, Genome Biology.

[39]  R. Shah,et al.  Role of the TMPRSS2-ERG gene fusion in prostate cancer. , 2008, Neoplasia.

[40]  A. Børresen-Dale,et al.  Identification of fusion genes in breast cancer by paired-end RNA-sequencing , 2011, Genome Biology.

[41]  F. Mitelman,et al.  Mitelman database of chromosome aberrations and gene fusions in cancer , 2014 .

[42]  Christopher A. Maher,et al.  ChimeraScan: a tool for identifying chimeric transcription in sequencing data , 2011, Bioinform..

[43]  G. Koretzky The legacy of the Philadelphia chromosome. , 2007, The Journal of clinical investigation.

[44]  Denise Anderson,et al.  FusionFinder: A Software Tool to Identify Expressed Gene Fusion Candidates from RNA-Seq Data , 2012, PloS one.

[45]  Chris Wiggins,et al.  Pegasus: a comprehensive annotation and prediction tool for detection of driver gene fusions in cancer , 2014, BMC Systems Biology.

[46]  Inanç Birol,et al.  Dissect: detection and characterization of novel structural alterations in transcribed sequences , 2012, Bioinform..

[47]  Fang Fang,et al.  FusionMap: detecting fusion genes from next-generation sequencing data at base-pair resolution , 2011, Bioinform..