ChimeRScope: a novel alignment-free algorithm for fusion transcript prediction using paired-end RNA-Seq data

Abstract The RNA-Seq technology has revolutionized transcriptome characterization not only by accurately quantifying gene expression, but also by the identification of novel transcripts like chimeric fusion transcripts. The ‘fusion’ or ‘chimeric’ transcripts have improved the diagnosis and prognosis of several tumors, and have led to the development of novel therapeutic regimen. The fusion transcript detection is currently accomplished by several software packages, primarily relying on sequence alignment algorithms. The alignment of sequencing reads from fusion transcript loci in cancer genomes can be highly challenging due to the incorrect mapping induced by genomic alterations, thereby limiting the performance of alignment-based fusion transcript detection methods. Here, we developed a novel alignment-free method, ChimeRScope that accurately predicts fusion transcripts based on the gene fingerprint (as k-mers) profiles of the RNA-Seq paired-end reads. Results on published datasets and in-house cancer cell line datasets followed by experimental validations demonstrate that ChimeRScope consistently outperforms other popular methods irrespective of the read lengths and sequencing depth. More importantly, results on our in-house datasets show that ChimeRScope is a better tool that is capable of identifying novel fusion transcripts with potential oncogenic functions. ChimeRScope is accessible as a standalone software at (https://github.com/ChimeRScope/ChimeRScope/wiki) or via the Galaxy web-interface at (https://galaxy.unmc.edu/).

[1]  D. Tietze Bird Species , 2018, Fascinating Life Sciences.

[2]  Sarah C. Ayling,et al.  The Ensembl gene annotation system , 2016, Database J. Biol. Databases Curation.

[3]  J. McPherson,et al.  Coming of age: ten years of next-generation sequencing technologies , 2016, Nature Reviews Genetics.

[4]  A. Moorman New and emerging prognostic and predictive genetic biomarkers in B-cell precursor acute lymphoblastic leukemia , 2016, Haematologica.

[5]  Hui Li,et al.  Comparative assessment of methods for the fusion transcripts detection from RNA-Seq data , 2016, Scientific Reports.

[6]  Yanjun Qi,et al.  Recurrent chimeric fusion RNAs in non-cancer tissues and cells , 2016, Nucleic acids research.

[7]  Adrian V. Lee,et al.  Comprehensive evaluation of fusion transcript detection algorithms and a meta-caller to combine top performing methods in paired-end RNA-seq data , 2015, Nucleic acids research.

[8]  Mengmeng Zhang,et al.  Disabled homolog 2 is required for migration and invasion of prostate cancer cells , 2015, Frontiers of Medicine.

[9]  Cathy H. Wu,et al.  Abstract 4859: Development of a cancer transcriptome analysis toolkit: identification of gene fusions in chronic lymphocytic leukemia , 2015 .

[10]  B. Johansson,et al.  The emerging complexity of gene fusions in cancer , 2015, Nature Reviews Cancer.

[11]  Can Alkan,et al.  Activating mutations of STAT5B and STAT3 in lymphomas derived from γδ-T or NK cells , 2015, Nature Communications.

[12]  A. Oshlack,et al.  JAFFA: High sensitivity transcriptome-focused fusion gene detection , 2015, bioRxiv.

[13]  Hui Li,et al.  Chimeric RNAs generated by intergenic splicing in normal and cancer cells , 2014, Genes, chromosomes & cancer.

[14]  O. Kallioniemi,et al.  FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data , 2014, bioRxiv.

[15]  Nicolas Stransky,et al.  The landscape of kinase fusions in cancer , 2014, Nature Communications.

[16]  J. Akers,et al.  RNA-seq of 272 gliomas revealed a novel, recurrent PTPRZ1-MET fusion transcript in secondary glioblastomas , 2014, Genome research.

[17]  Benjamin J. Raphael,et al.  Expanding the computational toolbox for mining cancer genomes , 2014, Nature Reviews Genetics.

[18]  S. Karmakar,et al.  The SMRT coregulator enhances growth of estrogen receptor-α-positive breast cancer cells by promotion of cell cycle progression and inhibition of apoptosis. , 2014, Endocrinology.

[19]  Melissa J. Landrum,et al.  RefSeq: an update on mammalian reference sequences , 2013, Nucleic Acids Res..

[20]  Wei Zhang,et al.  Fusion genes in solid tumors: an emerging target for cancer diagnosis and treatment , 2013, Chinese journal of cancer.

[21]  Hong-Tao Xu,et al.  Aberrant hypermethylation and reduced expression of disabled-2 promote the development of lung cancers. , 2013, International journal of oncology.

[22]  Rui Henrique,et al.  Novel 5' fusion partners of ETV1 and ETV4 in prostate cancer. , 2013, Neoplasia.

[23]  Marco Beccuti,et al.  State of art fusion-finder algorithms are suitable to detect transcription-induced chimeras in normal tissues? , 2013, BMC Bioinformatics.

[24]  M. Nykter,et al.  The tumorigenic FGFR3-TACC3 gene fusion escapes miR-99a regulation in glioblastoma. , 2013, The Journal of clinical investigation.

[25]  Alan Bridge,et al.  New and continuing developments at PROSITE , 2012, Nucleic Acids Res..

[26]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[27]  Jun Wang,et al.  SOAPfuse: an algorithm for identifying fusion transcripts from paired-end RNA-Seq data , 2013, Genome Biology.

[28]  M. Ragan,et al.  Next-generation phylogenomics , 2013, Biology Direct.

[29]  Alberto Magi,et al.  Discovering chimeric transcripts in paired-end RNA-seq data by using EricScript , 2012, Bioinform..

[30]  Mithat Gönen,et al.  Selection pressure exerted by imatinib therapy leads to disparate outcomes of imatinib discontinuation trials , 2012, Haematologica.

[31]  B. Faircloth,et al.  Primer3—new capabilities and interfaces , 2012, Nucleic acids research.

[32]  Jian Ye,et al.  Primer-BLAST: A tool to design target-specific primers for polymerase chain reaction , 2012, BMC Bioinformatics.

[33]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[34]  H. Kantarjian,et al.  Improved survival in chronic myeloid leukemia since the introduction of imatinib therapy: a single-institution historical experience. , 2011, Blood.

[35]  Dmitri Loguinov,et al.  Probabilistic near-duplicate detection using simhash , 2011, CIKM '11.

[36]  Fang Fang,et al.  FusionMap: detecting fusion genes from next-generation sequencing data at base-pair resolution , 2011, Bioinform..

[37]  A. Børresen-Dale,et al.  Identification of fusion genes in breast cancer by paired-end RNA-sequencing , 2011, Genome Biology.

[38]  Derek Y. Chiang,et al.  MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery , 2010, Nucleic acids research.

[39]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[40]  P. Edwards Fusion genes and chromosome translocations in the common epithelial cancers , 2009, The Journal of pathology.

[41]  Ka F To,et al.  Putative tumour-suppressor gene DAB2 is frequently down regulated by promoter hypermethylation in nasopharyngeal carcinoma , 2010, BMC Cancer.

[42]  S. Luo,et al.  Chimeric transcript discovery by paired-end transcriptome sequencing , 2009, Proceedings of the National Academy of Sciences.

[43]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[44]  Herve Avet-Loiseau,et al.  Loss of the SMRT/NCoR2 corepressor correlates with JAG2 overexpression in multiple myeloma. , 2009, Cancer research.

[45]  Hanna Eskelinen,et al.  Inhibition of MAPK-signaling pathway promotes the interaction of the corepressor SMRT with the human androgen receptor and mediates repression of prostate cancer cell growth in the presence of antiandrogens. , 2009, Journal of molecular endocrinology.

[46]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[47]  Lee T. Sam,et al.  Transcriptome Sequencing to Detect Gene Fusions in Cancer , 2009, Nature.

[48]  Christophe Lemetre,et al.  An introduction to artificial neural networks in bioinformatics - application to complex microarray and mass spectrometry datasets in cancer studies , 2008, Briefings Bioinform..

[49]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[50]  B. Johansson,et al.  The impact of translocations and gene fusions on cancer causation , 2007, Nature Reviews Cancer.

[51]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[52]  Ingo Roeder,et al.  Dynamic modeling of imatinib-treated chronic myeloid leukemia: functional insights and clinical implications , 2006, Nature Medicine.

[53]  Ioannis Panagopoulos,et al.  Fusion of ETV6 with an intronic sequence of the BAZ2A gene in a paediatric pre‐B acute lymphoblastic leukaemia with a cryptic chromosome 12 rearrangement , 2006, British journal of haematology.

[54]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[55]  C. Nowell The minute chromosome (Ph1) in chronic granulocytic leukemia , 1962, Blut: Zeitschrift für die Gesamte Blutforschung.

[56]  C. Peterson,et al.  The SANT domain: a unique histone-tail-binding module? , 2004, Nature Reviews Molecular Cell Biology.

[57]  C. Müller,et al.  Crystal structure and functional analysis of a nucleosome recognition module of the remodeling factor ISWI. , 2003, Molecular cell.

[58]  L. Espinosa,et al.  IκBα and p65 Regulate the Cytoplasmic Shuttling of Nuclear Corepressors: Cross-talk between Notch and NFκB Pathways , 2003 .

[59]  L. Espinosa,et al.  IkappaBalpha and p65 regulate the cytoplasmic shuttling of nuclear corepressors: cross-talk between Notch and NFkappaB pathways. , 2003, Molecular biology of the cell.

[60]  B. Calabretta,et al.  Post-transcriptional mechanisms in BCR/ABL leukemogenesis: role of shuttling RNA-binding proteins , 2002, Oncogene.

[61]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[62]  Mathew W. Wright,et al.  The HUGO Gene Nomenclature Committee (HGNC) , 2001, Human Genetics.

[63]  R. Aebersold,et al.  CD28 stimulation regulates its association with N‐ethylmaleimide‐sensitive fusion protein and other proteins involved in vesicle sorting , 2001, Proteomics.

[64]  S. Bohlander Fusion genes in leukemia: an emerging network , 2001, Cytogenetic and Genome Research.

[65]  M T Bejarano,et al.  NK cell triggering by the human costimulatory molecules CD80 and CD86. , 1999, Journal of immunology.

[66]  M. Carroll,et al.  CGP 57148, a tyrosine kinase inhibitor, inhibits the growth of cells expressing BCR-ABL, TEL-ABL, and TEL-PDGFR fusion proteins. , 1997, Blood.

[67]  Robert M. Zink,et al.  Bird species diversity , 1996, Nature.

[68]  S. Karlin,et al.  Over- and under-representation of short oligonucleotides in DNA sequences. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[69]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[70]  J. Rowley A New Consistent Chromosomal Abnormality in Chronic Myelogenous Leukaemia identified by Quinacrine Fluorescence and Giemsa Staining , 1973, Nature.

[71]  P. Nowell The minute chromosome (Phl) in chronic granulocytic leukemia. , 1962, Blut.

[72]  Can Alkan,et al.  Activating mutations of STAT5B and STAT3 in lymphomas derived from gamma delta-T or NK cells , 2022 .