ViralFusionSeq: accurately discover viral integration events and reconstruct fusion transcripts at single-base resolution

Summary: Insertional mutagenesis from virus infection is an important pathogenic risk for the development of cancer. Despite the advent of high-throughput sequencing, discovery of viral integration sites and expressed viral fusion events are still limited. Here, we present ViralFusionSeq (VFS), which combines soft-clipping information, read-pair analysis and targeted de novo assembly to discover and annotate viral–human fusions. VFS was used in an RNA-Seq experiment, simulated DNA-Seq experiment and re-analysis of published DNA-Seq datasets. Our experiments demonstrated that VFS is both sensitive and highly accurate. Availability: VFS is distributed under GPL version 3 at http://hkbic.cuhk.edu.hk/software/viralfusionseq Contact: tf.chan@cuhk.edu.hk Supplementary information: Supplementary data are available at Bioinformatics Online

[1]  Steven J. M. Jones,et al.  De novo assembly and analysis of RNA-seq data , 2010, Nature Methods.

[2]  Jonathan Leis,et al.  Retroviral DNA Integration , 1999, Microbiology and Molecular Biology Reviews.

[3]  John N. Weinstein,et al.  VirusSeq: software to identify viruses and their integration sites using next-generation sequencing of human cancer tissue , 2013, Bioinform..

[4]  M. Buendia,et al.  HBV induced carcinogenesis. , 2005, Journal of clinical virology : the official publication of the Pan American Society for Clinical Virology.

[5]  J. Peto,et al.  Human papillomavirus is a necessary cause of invasive cervical cancer worldwide , 1999, The Journal of pathology.

[6]  Fang Fang,et al.  FusionMap: detecting fusion genes from next-generation sequencing data at base-pair resolution , 2011, Bioinform..

[7]  Jian Gu,et al.  RNA-Seq Mapping and Detection of Gene Fusions with a Suffix Array Algorithm , 2012, PLoS Comput. Biol..

[8]  Wei Li,et al.  A Statistical Method for the Detection of Alternative Splicing Using RNA-Seq , 2010, PloS one.

[9]  Denise Anderson,et al.  FusionFinder: A Software Tool to Identify Expressed Gene Fusion Candidates from RNA-Seq Data , 2012, PloS one.

[10]  Masao Nagasaki,et al.  ClipCrop: a tool for detecting structural variations with single-base resolution using soft-clipping information , 2011, BMC Bioinformatics.

[11]  Hiroaki Iwata,et al.  Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features , 2012, Nucleic acids research.

[12]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[13]  David Z. Chen,et al.  METHOD Open Access , 2014 .

[14]  Krishna R. Kalari,et al.  A novel bioinformatics pipeline for identification and characterization of fusion transcripts in breast cancer and normal cell lines , 2011, Nucleic acids research.

[15]  S. Redaelli,et al.  FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery , 2012, Nucleic acids research.

[16]  Michael C. Rusch,et al.  CREST maps somatic structural variation in cancer genomes with base-pair resolution , 2011, Nature Methods.

[17]  Alberto Magi,et al.  Discovering chimeric transcripts in paired-end RNA-seq data by using EricScript , 2012, Bioinform..

[18]  B. Haas,et al.  Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. , 2011, Genome research.

[19]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[20]  Jian Ma,et al.  FusionHunter: identifying fusion transcripts in cancer using paired-end RNA-seq , 2011, Bioinform..

[21]  F. Bushman,et al.  Retroviral DNA Integration: ASLV, HIV, and MLV Show Distinct Target Site Preferences , 2004, PLoS biology.

[22]  J. Squire,et al.  Positional expression profiling indicates candidate genes in deletion hotspots of hepatocellular carcinoma , 2006, Modern Pathology.

[23]  Ofer Isakov,et al.  Pathogen detection using short-RNA deep sequencing subtraction and assembly , 2011, Bioinform..

[24]  S. Vinokurova,et al.  Characterization of viral-cellular fusion transcripts in a large series of HPV16 and 18 positive anogenital lesions , 2002, Oncogene.

[25]  F. Zoulim,et al.  Hepatitis B virus induced hepatocellular carcinoma. , 2009, Cancer letters.

[26]  Christopher A. Maher,et al.  ChimeraScan: a tool for identifying chimeric transcription in sequencing data , 2011, Bioinform..

[27]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[28]  Lars Jansen,et al.  Non-Random Integration of the HPV Genome in Cervical Cancer , 2012, PloS one.

[29]  Rob Knight,et al.  UCHIME improves sensitivity and speed of chimera detection , 2011, Bioinform..

[30]  Süleyman Cenk Sahinalp,et al.  deFuse: An Algorithm for Gene Fusion Discovery in Tumor RNA-Seq Data , 2011, PLoS Comput. Biol..

[31]  E. Lander,et al.  Genomic mapping by fingerprinting random clones: a mathematical analysis. , 1988, Genomics.

[32]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[33]  Angela M. Liu,et al.  Genome-wide survey of recurrent HBV integration in hepatocellular carcinoma , 2012, Nature Genetics.

[34]  S. Salzberg,et al.  TopHat-Fusion: an algorithm for discovery of novel fusion transcripts , 2011, Genome Biology.

[35]  Thomas D. Wu,et al.  The effects of hepatitis B virus integration into the genomes of hepatocellular carcinoma patients. , 2012, Genome research.

[36]  Xin Zhao,et al.  Dr.VIS: a database of human disease-related viral integration sites , 2011, Nucleic Acids Res..

[37]  Louis Flamand,et al.  Herpesviruses and Chromosomal Integration , 2010, Journal of Virology.

[38]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[39]  Sean D. Mooney,et al.  Identifying viral integration sites using SeqMap 2.0 , 2011, Bioinform..

[40]  Erik S. Wright,et al.  DECIPHER, a Search-Based Approach to Chimera Identification for 16S rRNA Sequences , 2011, Applied and Environmental Microbiology.

[41]  Vineet Bafna,et al.  Sensitive gene fusion detection using ambiguously mapping RNA-Seq read pairs , 2011, Bioinform..

[42]  L. Young,et al.  Epstein–Barr virus: 40 years on , 2004, Nature Reviews Cancer.

[43]  Faraz Hach,et al.  Comrad: detection of expressed rearrangements by integrated analysis of RNA-Seq and low coverage genome sequence data , 2011, Bioinform..

[44]  Eivind Hovig,et al.  The majority of viral-cellular fusion transcripts in cervical carcinomas cotranscribe cellular sequences of known or predicted genes. , 2008, Cancer research.