BATVI: Fast, sensitive and accurate detection of virus integrations

BackgroundThe study of virus integrations in human genome is important since virus integrations were shown to be associated with diseases. In the literature, few methods have been proposed that predict virus integrations using next generation sequencing datasets. Although they work, they are slow and are not very sensitive.Results and discussionThis paper introduces a new method BatVI to predict viral integrations. Our method uses a fast screening method to filter out chimeric reads containing possible viral integrations. Next, sensitive alignments of these candidate chimeric reads are called by BLAST. Chimeric reads that are co-localized in the human genome are clustered. Finally, by assembling the chimeric reads in each cluster, high confident virus integration sites are extracted.ConclusionWe compared the performance of BatVI with existing methods VirusFinder and VirusSeq using both simulated and real-life datasets of liver cancer patients. BatVI ran an order of magnitude faster and was able to predict almost twice the number of true positives compared to other methods while maintaining a false positive rate less than 1%. For the liver cancer datasets, BatVI uncovered novel integrations to two important genes TERT and MLL4, which were missed by previous studies. Through gene expression data, we verified the correctness of these additional integrations.BatVI can be downloaded from http://biogpu.ddns.comp.nus.edu.sg/~ksung/batvi/index.html.

[1]  Angela M. Liu,et al.  Genome-wide survey of recurrent HBV integration in hepatocellular carcinoma , 2012, Nature Genetics.

[2]  Ofer Isakov,et al.  Pathogen detection using short-RNA deep sequencing subtraction and assembly , 2011, Bioinform..

[3]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[4]  Michael C. Rusch,et al.  CREST maps somatic structural variation in cancer genomes with base-pair resolution , 2011, Nature Methods.

[5]  P. Rous,et al.  A TRANSMISSIBLE AVIAN NEOPLASM. (SARCOMA OF THE COMMON FOWL.) , 1910, The Journal of experimental medicine.

[6]  P Rous,et al.  A transmissible avian neoplasm. (Sarcoma of the common fowl) by Peyton Rous, M.D., Experimental Medicine for Sept. 1, 1910, vol. 12, pp.696- 705 , 1979, The Journal of experimental medicine.

[7]  Karim Benkirane,et al.  Comparison of DNA methylation profiles in human fetal and adult red blood cell progenitors , 2015, Genome Medicine.

[8]  Emmanuel Barillot,et al.  SVDetect: a tool to identify genomic structural variations from paired-end and mate-pair sequencing data , 2010, Bioinform..

[9]  Wing-Kin Sung,et al.  BatMis: a fast algorithm for k-mismatch mapping , 2012, Bioinform..

[10]  W. Sung,et al.  BatMeth: improved mapper for bisulfite sequencing reads on DNA methylation , 2012, Genome Biology.

[11]  Xun Xu,et al.  HIVID: an efficient method to detect HBV integration using low coverage sequencing. , 2013, Genomics.

[12]  John N. Weinstein,et al.  VirusSeq: software to identify viruses and their integration sites using next-generation sequencing of human cancer tissue , 2013, Bioinform..

[13]  Thomas Lengauer,et al.  Genotyping hepatitis B virus dual infections using population-based sequence data. , 2012, The Journal of general virology.

[14]  Zhongming Zhao,et al.  VERSE: a novel approach to detect virus integration in host genomes through reference genome customization , 2015, Genome Medicine.

[15]  Ding-Shinn Chen,et al.  Global control of hepatitis B virus infection. , 2002, The Lancet. Infectious diseases.

[16]  Erika J. Thompson,et al.  Landscape of DNA Virus Associations across Human Malignant Cancers: Analysis of 3,775 Cases Using RNA-Seq , 2013, Journal of Virology.

[17]  Zhongming Zhao,et al.  VirusFinder: Software for Efficient and Accurate Detection of Viruses and Their Integration Sites in Host Genomes through Next Generation Sequencing Data , 2013, PloS one.

[18]  G. Getz,et al.  PathSeq: software to identify or discover microbes by deep sequencing of human tissue , 2011, Nature Biotechnology.

[19]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[20]  Ting-Fung Chan,et al.  ViralFusionSeq: accurately discover viral integration events and reconstruct fusion transcripts at single-base resolution , 2013, Bioinform..

[21]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[22]  Manuel Holtgrewe,et al.  Mason – A Read Simulator for Second Generation Sequencing Data , 2010 .