VirGenA: a reference‐based assembler for variable viral genomes

Abstract Characterization of the within‐host genetic diversity of viral pathogens is required for selection of effective treatment of some important viral infections, e.g. HIV, HBV and HCV. Despite the technical ability of detection, there are conflicting data regarding the clinical significance of low‐frequency variants, partially because of the difficulty of their distinguishing from experimental artifacts. The issue of cross‐contamination is relevant for all highly sensitive techniques, including deep sequencing: even trace contamination leads to a significant increase of false positives in identified SNVs. Determination of infections by multiple genotypes of some viruses, the incidence of which can be considerable, especially in risk groups, is also clinically significant in some cases. We developed a new viral reference‐guided assembler, VirGenA, that can separate mixtures of strains of different intraspecies genetic groups (genotypes, subtypes, clades, etc.) and assemble a separate consensus sequence for each group in a mixture. It produced long assemblies for mixture components of extremely low frequencies (<1%) allowing detection of cross‐contamination of samples by divergent genotypes. We tested VirGenA on both clinical and simulated data. On both types of data, VirGenA shows better or similar results than the existing de novo assemblers. Cross‐platform implementation (including source code) is freely available at https://github.com/gFedonin/VirGenA/releases.

[1]  Paolo Piazza,et al.  Comparison of Next-Generation Sequencing Technologies for Comprehensive Assessment of Full-Length Hepatitis C Viral Genomes , 2016, Journal of Clinical Microbiology.

[2]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[3]  Eleazar Eskin,et al.  Assembly of non-unique insertion content using next-generation sequencing , 2011, BMC Bioinformatics.

[4]  Kristian Cibulskis,et al.  ContEst: estimating cross-contamination of human samples in next-generation sequencing data , 2011, Bioinform..

[5]  Susan Little,et al.  Comparison of methods to detect HIV dual infection. , 2010, AIDS research and human retroviruses.

[6]  Yoshio Tateno,et al.  Development and public release of a comprehensive hepatitis virus database , 2008, Hepatology research : the official journal of the Japan Society of Hepatology.

[7]  Sergei L. Kosakovsky Pond,et al.  Incidence and prevalence of intrasubtype HIV-1 dual infection in at-risk men in the United States. , 2013, The Journal of infectious diseases.

[8]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[9]  Tatiana A. Tatusova,et al.  A web-based genotyping resource for viral sequences , 2004, Nucleic Acids Res..

[10]  F. Zanini,et al.  Error rates, PCR recombination, and sampling depth in HIV-1 Whole Genome Deep Sequencing , 2016, bioRxiv.

[11]  Jean-Michel Pawlotsky,et al.  EASL Recommendations on Treatment of Hepatitis C 2018. , 2018, Journal of hepatology.

[12]  Thomas Lengauer,et al.  Genotyping hepatitis B virus dual infections using population-based sequence data. , 2012, The Journal of general virology.

[13]  Astrid Gall,et al.  IVA: accurate de novo assembly of RNA virus genomes , 2015, Bioinform..

[14]  Alexander V. Favorov,et al.  Base-Calling Algorithm with Vocabulary (BCV) Method for Analyzing Population Sequencing Chromatograms , 2013, PloS one.

[15]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[16]  S. Kottilil,et al.  Clinical Laboratory Testing in the Era of Directly Acting Antiviral Therapies for Hepatitis C , 2016, Clinical Microbiology Reviews.

[17]  G. McVean,et al.  De novo assembly and genotyping of variants using colored de Bruijn graphs , 2011, Nature Genetics.

[18]  L. Cuzin,et al.  Minority resistant HIV-1 variants and the response to first-line NNRTI therapy. , 2015, Journal of clinical virology : the official publication of the Pan American Society for Clinical Virology.

[19]  Zhen Yue,et al.  pIRS: Profile-based Illumina pair-end reads simulator , 2012, Bioinform..

[20]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[21]  Matthew Berriman,et al.  Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology , 2010, Bioinform..

[22]  M. Prins,et al.  Infection with concurrent multiple hepatitis C virus genotypes is associated with faster HIV disease progression , 2004, AIDS.

[23]  Lior Pachter,et al.  Development of a Low Bias Method for Characterizing Viral Populations Using Next Generation Sequencing Technology , 2010, PloS one.

[24]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[25]  M. Pignatelli,et al.  Comparison of different assembly and annotation tools on analysis of simulated viral metagenomic communities in the gut , 2014, BMC Genomics.

[26]  T. Quinn,et al.  Frequency and implications of HIV superinfection. , 2013, The Lancet. Infectious diseases.

[27]  Vladimir Vacic,et al.  Conpair: concordance and contamination estimator for matched tumor–normal pairs , 2016, Bioinform..

[28]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[29]  B. Berkhout,et al.  High prevalence of hepatitis B virus dual infection with genotypes A and G in HIV-1 infected men in Amsterdam, the Netherlands, during 2000-2011 , 2013, BMC Infectious Diseases.

[30]  David L. Robertson,et al.  The Evolutionary Analysis of Emerging Low Frequency HIV-1 CXCR4 Using Variants through Time—An Ultra-Deep Approach , 2010, PLoS Comput. Biol..

[31]  Jun Yong Choi,et al.  Detection of Minority Resistance during Early HIV-1 Infection: Natural Variation and Spurious Detection rather than Transmission and Evolution of Multiple Viral Variants , 2011, Journal of Virology.

[32]  Elizabeth M. Ryan,et al.  De novo assembly of highly diverse viral populations , 2012, BMC Genomics.

[33]  Huldrych F. Günthard,et al.  Whole Genome Deep Sequencing of HIV-1 Reveals the Impact of Early Minor Variants Upon Immune Recognition During Acute Infection , 2012, PLoS pathogens.

[34]  M. Kozal,et al.  Relationship between minority nonnucleoside reverse transcriptase inhibitor resistance mutations, adherence, and the risk of virologic failure , 2012, AIDS.

[35]  J. Plantier,et al.  A Multiplex PCR Approach for Detecting Dual Infections and Recombinants Involving Major HIV Variants , 2016, Journal of Clinical Microbiology.

[36]  T. de Oliveira,et al.  Are subtype differences important in HIV drug resistance? , 2012, Current opinion in virology.

[37]  J. J. Henning,et al.  Guidelines for the Use of Antiretroviral Agents in HIV-Infected Adults and Adolescents, January 28, 2000 , 1998, HIV clinical trials.

[38]  K. Metzner,et al.  Low-Frequency HIV-1 Drug Resistance Mutations and Risk of NNRTI-Based Antiretroviral Treatment Failure , 2011 .

[39]  Walter Pirovano,et al.  BIOINFORMATICS APPLICATIONS , 2022 .

[40]  J. Wong,et al.  Current Trends of HIV Recombination Worldwide , 2013, Infectious disease reports.

[41]  G. Dore,et al.  Frequent multiple hepatitis C virus infections among injection drug users in a prison setting , 2010, Hepatology.

[42]  Aasld Idsa Hcv Guidance Panel Hepatitis C guidance: AASLD‐IDSA recommendations for testing, managing, and treating adults infected with hepatitis C virus , 2015, Hepatology.

[43]  D. Kuritzkes,et al.  Clinical implications of HIV-1 minority variants. , 2013, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[44]  N. Terrault,et al.  AASLD guidelines for treatment of chronic hepatitis B , 2016, Hepatology.

[45]  Gabor T. Marth,et al.  MOSAIK: A Hash-Based Algorithm for Accurate Next-Generation Sequencing Short-Read Mapping , 2013, PloS one.