Detecting exact breakpoints of deletions with diversity in hepatitis B viral genomic DNA from next-generation sequencing data.

Many studies have suggested that deletions of Hepatitis B Viral (HBV) are associated with the development of progressive liver diseases, even ultimately resulting in hepatocellular carcinoma (HCC). Among the methods for detecting deletions from next-generation sequencing (NGS) data, few methods considered the characteristics of virus, such as high evolution rates and high divergence among the different HBV genomes. Sequencing high divergence HBV genome sequences using the NGS technology outputs millions of reads. Thus, detecting exact breakpoints of deletions from these big and complex data incurs very high computational cost. We proposed a novel analytical method named VirDelect (Virus Deletion Detect), which uses split read alignment base to detect exact breakpoint and diversity variable to consider high divergence in single-end reads data, such that the computational cost can be reduced without losing accuracy. We use four simulated reads datasets and two real pair-end reads datasets of HBV genome sequence to verify VirDelect accuracy by score functions. The experimental results show that VirDelect outperforms the state-of-the-art method Pindel in terms of accuracy score for all simulated datasets and VirDelect had only two base errors even in real datasets. VirDelect is also shown to deliver high accuracy in analyzing the single-end read data as well as pair-end data. VirDelect can serve as an effective and efficient bioinformatics tool for physiologists with high accuracy and efficient performance and applicable to further analysis with characteristics similar to HBV on genome length and high divergence. The software program of VirDelect can be downloaded at https://sourceforge.net/projects/virdelect/.

[1]  C. Seeger,et al.  Molecular biology of hepatitis B virus infection. , 2015, Virology.

[2]  Ali Bashir,et al.  A geometric approach for classification and comparison of structural variants , 2009, Bioinform..

[3]  D. Ganem,et al.  Hepatitis B virus infection--natural history and clinical consequences. , 2004, The New England journal of medicine.

[4]  Yadong Wang,et al.  PRISM: Pair-read informed split-read mapping for base-pair level detection of insertion, deletion and structural variants , 2012, Bioinform..

[5]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[6]  Peiyong Guan,et al.  Structural variation detection using next-generation sequencing data: A comparative technical review. , 2016, Methods.

[7]  S. Günther,et al.  Hepatitis B virus sequence changes evolving in liver transplant recipients with fulminant hepatitis. , 1997, Journal of hepatology.

[8]  B. McMahon,et al.  Chronic hepatitis B , 2001, Hepatology.

[9]  Ting Chang,et al.  DeF-GPU: Efficient and effective deletions finding in hepatitis B viral genomic DNA using a GPU architecture. , 2016, Methods.

[10]  T. Chiang,et al.  Five subgenotypes of hepatitis B virus genotype B with distinct geographic and virological characteristics. , 2007, Virus research.

[11]  M. Lai,et al.  Endoplasmic Reticulum Stress Stimulates the Expression of Cyclooxygenase-2 through Activation of NF-κB and pp38 Mitogen-activated Protein Kinase* , 2004, Journal of Biological Chemistry.

[12]  A. Lok,et al.  High degree of conservation in the hepatitis B virus core gene during the immune tolerant phase in perinatally acquired chronic hepatitis B virus infection. , 1997, Journal of hepatology.

[13]  I. Su,et al.  Hepatitis B virus pre‐S mutants, endoplasmic reticulum stress and hepatocarcinogenesis , 2006, Cancer science.

[14]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[15]  H. Sakugawa,et al.  Typing hepatitis B virus by homology in nucleotide sequence: comparison of surface antigen subtypes. , 1988, The Journal of general virology.

[16]  Paul Medvedev,et al.  Computational methods for discovering structural variation with next-generation sequencing , 2009, Nature Methods.

[17]  Alan S. Perelson,et al.  Kinetics of Acute Hepatitis B Virus Infection in Humans , 2000, The Journal of experimental medicine.

[18]  M. Lai,et al.  Hepatitis B Virus Pre-S2 Mutant Surface Antigen Induces Degradation of Cyclin-Dependent Kinase Inhibitor p27Kip1 through c-Jun Activation Domain-Binding Protein 1 , 2007, Molecular Cancer Research.

[19]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[20]  C. Yen,et al.  Aligning to the sample-specific reference sequence to optimize the accuracy of next-generation sequencing analysis for hepatitis B virus , 2015, Hepatology International.

[21]  Yongmei Li,et al.  Hepatitis B virus X protein mutant upregulates CENP-A expression in hepatoma cells. , 2011, Oncology reports.

[22]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[23]  Vincent S. Tseng,et al.  A One-Phase Method for Mining High Utility Mobile Sequential Patterns in Mobile Commerce Environments , 2012, IEA/AIE.

[24]  I. Lauder,et al.  Application of hepatitis B virus (HBV) DNA sequence polymorphisms to the study of HBV transmission. , 1991, The Journal of infectious diseases.

[25]  D. Lavanchy,et al.  Hepatitis B virus epidemiology, disease burden, treatment, and current and emerging prevention and control measures , 2004, Journal of viral hepatitis.

[26]  D. Stram,et al.  Relationship of serological subtype, basic core promoter and precore mutations to genotypes/subgenotypes of hepatitis B virus , 2008, Journal of medical virology.

[27]  Jin Zhang,et al.  An improved approach for accurate and efficient calling of structural variations with low-coverage sequence data , 2012, BMC Bioinformatics.

[28]  R. Wilson,et al.  BreakDancer: An algorithm for high resolution mapping of genomic structural variation , 2009, Nature Methods.

[29]  Martin Vingron,et al.  Breakpointer: using local mapping artifacts to support sequence breakpoint discovery from single-end reads , 2012, Bioinform..

[30]  Y. Wang,et al.  Hepatitis B virus X protein accelerates the development of hepatoma , 2014, Cancer biology & medicine.

[31]  M. Ridder,et al.  Hepatitis B vaccine effectiveness in the face of global HBV genotype diversity , 2011, Expert review of vaccines.

[32]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[33]  Benjamin J. Raphael,et al.  An integrative probabilistic model for identification of structural variation in sequencing data , 2012, Genome Biology.

[34]  F. Sugauchi,et al.  Hepatitis B Virus of Genotype B with or without Recombination with Genotype C over the Precore Region plus the Core Gene , 2002, Journal of Virology.

[35]  M. Gerstein,et al.  PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data , 2009, Genome Biology.

[36]  M. Buti,et al.  Quasispecies structure, cornerstone of hepatitis B virus infection: mass sequencing approach. , 2013, World journal of gastroenterology.