SNAPPy: A snakemake pipeline for scalable HIV-1 subtyping by phylogenetic pairing

Human immunodeficiency virus 1 (HIV-1) genome sequencing is routinely done for drug resistance monitoring in hospitals worldwide. Subtyping these extensive datasets of HIV-1 sequences is a critical first step in molecular epidemiology and surveillance studies. The clinical relevance of HIV-1 subtypes is increasingly recognized. Several studies suggest subtype-related differences in disease progression, transmission route efficiency, immune evasion, and even therapeutic outcomes. HIV-1 subtyping is mainly done using web servers. These tools have limitations in scalability and potential noncompliance with data protection legislation. Thus, the aim of this work was to develop an efficient method for local and high-throughput HIV-1 subtyping. We designed SNAPPy: a snakemake pipeline for scalable HIV-1 subtyping by phylogenetic pairing. It contains several tasks of phylogenetic inference and BLAST queries, which can be executed sequentially or in parallel, taking advantage of multiple-core processing units. Although it was built for subtyping, SNAPPy is also useful to perform extensive HIV-1 alignments. This tool facilitates large-scale sequence-based HIV-1 research by providing a local, resource efficient and scalable alternative for HIV-1 subtyping. It is capable of analysing full-length genomes or partial HIV-1 genomic regions (GAG, POL, ENV) and recognizes more than 90 circulating recombinant forms. SNAPPy is freely available at: https://github.com/PMMAraujo/snappy.

[1]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[2]  Oliver Laeyendecker,et al.  Effect of human immunodeficiency virus Type 1 (HIV-1) subtype on disease progression in persons from Rakai, Uganda, with incident HIV-1 infection. , 2008, The Journal of infectious diseases.

[3]  P. Kaleebu,et al.  Frequencies of Gag-restricted T-cell escape “footprints” differ across HIV-1 clades A1 and D chronically infected Ugandans irrespective of host HLA B alleles , 2015, Vaccine.

[4]  Ming Zhang,et al.  jpHMM: Improving the reliability of recombination prediction in HIV-1 , 2009, Nucleic Acids Res..

[5]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[6]  Sergei L. Kosakovsky Pond,et al.  An Evolutionary Model-Based Algorithm for Accurate Phylogenetic Breakpoint Mapping and Subtype Prediction in HIV-1 , 2009, PLoS Comput. Biol..

[7]  Anne-Mieke Vandamme,et al.  Assessment of automated genotyping protocols as tools for surveillance of HIV-1 genetic diversity , 2006, AIDS.

[8]  P. Kaleebu,et al.  Analysis of the history and spread of HIV-1 in Uganda using phylodynamics , 2015, The Journal of general virology.

[9]  Anne-Mieke Vandamme,et al.  Antiretroviral resistance in different HIV-1 subtypes: impact on therapy outcomes and resistance testing interpretation , 2007, Current opinion in HIV and AIDS.

[10]  Tatiana A. Tatusova,et al.  A web-based genotyping resource for viral sequences , 2004, Nucleic Acids Res..

[11]  C. Rousseau,et al.  Subtype C Is associated with increased vaginal shedding of HIV-1. , 2005, The Journal of infectious diseases.

[12]  Francesco Montella,et al.  Comparative Evaluation of Subtyping Tools for Surveillance of Newly Emerging HIV-1 Strains , 2017, Journal of Clinical Microbiology.

[13]  M. Wainberg,et al.  A V106M mutation in HIV-1 clade C viruses exposed to efavirenz confers cross-resistance to non-nucleoside reverse transcriptase inhibitors , 2003, AIDS.

[14]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[15]  Pedro M M Araújo,et al.  Characterization of a large cluster of HIV-1 A1 infections detected in Portugal and connected to several Western European countries , 2019, Scientific Reports.

[16]  N. Taveira,et al.  Origin and Epidemiological History of HIV-1 CRF14_BG , 2011, PloS one.

[17]  Paolo Di Tommaso,et al.  Nextflow enables reproducible computational workflows , 2017, Nature Biotechnology.

[18]  Fred L. Drake,et al.  Python 3 Reference Manual , 2009 .

[19]  J. Hemelaar Implications of HIV diversity for the HIV-1 pandemic. , 2013, The Journal of infection.

[20]  Travis E. Oliphant,et al.  Guide to NumPy , 2015 .

[21]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.

[22]  Anne-Mieke Vandamme,et al.  Automated subtyping of HIV-1 genetic sequences for clinical and surveillance , 2013 .

[23]  B. Korber,et al.  HIV sequence compendium 2002 , 2002 .

[24]  R. Rosner Computer software , 1978, Nature.

[25]  Anne-Mieke Vandamme,et al.  Protease mutation M89I/V is linked to therapy failure in patients infected with the HIV-1 non-B subtypes C, F or G , 2005, AIDS.

[26]  H Hui,et al.  The heterosexual human immunodeficiency virus type 1 epidemic in Thailand is caused by an intersubtype (A/E) recombinant of African origin , 1996, Journal of virology.

[27]  J. Mullins,et al.  HIV Sequence Compendium 2010 , 2010 .

[28]  Anne-Mieke Vandamme,et al.  Recombination Confounds the Early Evolutionary History of Human Immunodeficiency Virus Type 1: Subtype G Is a Circulating Recombinant Form , 2007, Journal of Virology.

[29]  Peter J. A. Cock,et al.  Bio.Phylo: A unified toolkit for processing, analyzing and visualizing phylogenetic trees in Biopython , 2012, BMC Bioinformatics.

[30]  Bhavna Chohan,et al.  HIV-1 subtype D infection is associated with faster disease progression than subtype A in spite of similar plasma HIV-1 loads. , 2007, The Journal of infectious diseases.

[31]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[32]  G. Learn,et al.  HIV-1 Nomenclature Proposal , 2000, Science.

[33]  Sven Rahmann,et al.  Genome analysis , 2022 .

[34]  Glenn Lawyer,et al.  COMET: adaptive context-based modeling for ultrafast HIV-1 subtype identification , 2014, Nucleic acids research.

[35]  P. Bork,et al.  ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data , 2016, Molecular biology and evolution.

[36]  W. Fawzi,et al.  Preferential in-utero transmission of HIV-1 subtype C as compared to HIV-1 subtype A or D , 2004, AIDS.

[37]  A. Harrison,et al.  A statistical model for HIV-1 sequence classification using the subtype analyser (STAR) , 2005, Bioinform..

[38]  David E. Fisher,et al.  Background on Bruno , 2000, Science.

[39]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[40]  Klaus Korn,et al.  HIV-1 subtype distribution and its demographic determinants in newly diagnosed patients in Europe suggest highly compartmentalized epidemics , 2013, Retrovirology.

[41]  P. Easterbrook,et al.  Impact of HIV-1 viral subtype on disease progression and response to antiretroviral therapy , 2010, Journal of the International AIDS Society.

[42]  Tommy F. Liu,et al.  Web resources for HIV type 1 genotypic-resistance test interpretation. , 2006, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[43]  R. Shafer Rationale and uses of a public HIV drug-resistance database. , 2006, The Journal of infectious diseases.

[44]  Obi L. Griffith,et al.  HIV Sequence Database , 2014 .

[45]  M. Pérez‐Losada,et al.  Recombination in viruses: Mechanisms, methods of study, and evolutionary consequences , 2014, Infection, Genetics and Evolution.