An Improved Sequencing-Based Bioinformatics Pipeline to Track the Distribution and Clonal Architecture of Proviral Integration Sites

The combined application of linear amplification-mediated PCR (LAM-PCR) protocols with next-generation sequencing (NGS) has had a large impact on our understanding of retroviral pathogenesis. Previously, considerable effort has been expended to optimize NGS methods to explore the genome-wide distribution of proviral integration sites and the clonal architecture of clinically important retroviruses like human T-cell leukemia virus type-1 (HTLV-1). Once sequencing data are generated, the application of rigorous bioinformatics analysis is central to the biological interpretation of the data. To better exploit the potential information available through these methods, we developed an optimized bioinformatics pipeline to analyze NGS clonality datasets. We found that short-read aligners, specifically designed to manage NGS datasets, provide increased speed, significantly reducing processing time and decreasing the computational burden. This is achieved while also accounting for sequencing base quality. We demonstrate the utility of an additional trimming step in the workflow, which adjusts for the number of reads supporting each insertion site. In addition, we developed a recall procedure to reduce bias associated with proviral integration within low complexity regions of the genome, providing a more accurate estimation of clone abundance. Finally, we recommend the application of a “clean-and-recover” step to clonality datasets generated from large cohorts and longitudinal studies. In summary, we report an optimized bioinformatics workflow for NGS clonality analysis and describe a new set of steps to guide the computational process. We demonstrate that the application of this protocol to the analysis of HTLV-1 and bovine leukemia virus (BLV) clonality datasets improves the quality of data processing and provides a more accurate definition of the clonal landscape in infected individuals. The optimized workflow and analysis recommendations can be implemented in the majority of bioinformatics pipelines developed to analyze LAM-PCR-based NGS clonality datasets.

[1]  Heng Li,et al.  A survey of sequence alignment algorithms for next-generation sequencing , 2010, Briefings Bioinform..

[2]  F. Bushman,et al.  The host genomic environment of the provirus determines the abundance of HTLV-1–infected T-cell clones , 2011, Blood.

[3]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[4]  C. Pise-Masison,et al.  Essential Role of Human T Cell Leukemia Virus Type 1 orf-I in Lethal Proliferation of CD4+ Cells in Humanized Mice , 2019, Journal of Virology.

[5]  C. von Kalle,et al.  Polyclonal long-term repopulating stem cell clones in a primate model. , 2002, Blood.

[6]  Florian Klein,et al.  HIV-1 Integration Landscape during Latent and Active Infection , 2015, Cell.

[7]  A. Utsunomiya,et al.  The Nature of HTLV-1 Provirus in Naturally Infected Individuals Analyzed by Viral DNA-Capture-Seq Approach , 2019, SSRN Electronic Journal.

[8]  K. Nakano,et al.  Transition of adult T-cell leukemia/lymphoma clones during clinical progression , 2016, International journal of hematology.

[9]  A. Utsunomiya,et al.  The Nature of the HTLV-1 Provirus in Naturally Infected Individuals Analyzed by the Viral DNA-Capture-Seq Approach. , 2019, Cell reports.

[10]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[11]  Brendan B. Larsen,et al.  Proliferation of cells with HIV integrated into cancer genes contributes to persistent infection , 2014, Science.

[12]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[13]  R. Morgan,et al.  Common Viral Integration Sites Identified in Avian Leukosis Virus-Induced B-Cell Lymphomas , 2015, mBio.

[14]  Hanno Glimm,et al.  High-resolution insertion-site analysis by linear amplification–mediated PCR (LAM-PCR) , 2007, Nature Methods.

[15]  L. Ratner,et al.  PDZ domain-binding motif of Tax sustains T-cell proliferation in HTLV-1-infected humanized mice , 2018, PLoS pathogens.

[16]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[17]  E. Inada,et al.  Repeated human deciduous tooth-derived dental pulp cell reprogramming factor transfection yields multipotent intermediate cells with enhanced iPS cell formation capability , 2019, Scientific Reports.

[18]  K. Maeda,et al.  HIV-1 DNA-capture-seq is a useful tool for the comprehensive characterization of HIV-1 provirus , 2019, Scientific Reports.

[19]  M. Georges,et al.  Cis-perturbation of cancer drivers by the HTLV-1/BLV proviruses is an early determinant of leukemogenesis , 2017, Nature Communications.

[20]  K. Nakai,et al.  Multidisciplinary insight into clonal expansion of HTLV-1–infected cells in adult T-cell leukemia via modeling by deterministic finite automata coupled with high-throughput sequencing , 2017, BMC Medical Genomics.

[21]  S. Hughes,et al.  Clonally expanded CD4+ T cells can produce infectious HIV-1 in vivo , 2016, Proceedings of the National Academy of Sciences.

[22]  M. Georges,et al.  Monitoring molecular response in adult T-cell leukemia by high-throughput sequencing analysis of HTLV-1 clonality , 2017, Leukemia.

[23]  Stephan Wolf,et al.  Genome-wide high-throughput integrome analyses by nrLAM-PCR and next-generation sequencing , 2010, Nature Protocols.

[24]  R. Mahieux,et al.  HTLV-1-induced leukotriene B4 secretion by T cells promotes T cell recruitment and virus propagation , 2017, Nature Communications.

[25]  F. Deist,et al.  Gene therapy of human severe combined immunodeficiency (SCID)-X1 disease. , 2000, Science.

[26]  S. Hughes,et al.  Specific HIV integration sites are linked to clonal expansion and persistence of infected cells , 2014, Science.

[27]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[28]  L. Luo,et al.  Splinkerette PCR for Mapping Transposable Elements in Drosophila , 2010, PloS one.

[29]  R. Siliciano,et al.  HIV Integration Site Analysis of Cellular Models of HIV Latency with a Probe-Enriched Next-Generation Sequencing Assay , 2016, Journal of Virology.

[30]  J. Lisziewicz,et al.  Latent infection of CD4+ T cells provides a mechanism for lifelong persistence of HIV-1, even in patients on effective combination therapy , 1999, Nature Medicine.

[31]  R. Morgan,et al.  Selection for avian leukosis virus integration sites determines the clonal progression of B-cell lymphomas , 2017, PLoS pathogens.

[32]  D. Laydon,et al.  Genome-wide Determinants of Proviral Targeting, Clonal Abundance and Expression in Natural HTLV-1 Infection , 2013, PLoS pathogens.

[33]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[34]  E. Rosenberg,et al.  Intact HIV-1 proviruses accumulate at distinct chromosomal positions during prolonged antiretroviral therapy , 2019, The Journal of clinical investigation.

[35]  L. Willems,et al.  Massive Depletion of Bovine Leukemia Virus Proviral Clones Located in Genomic Transcriptionally Active Sites during Primary Infection , 2013, PLoS pathogens.

[36]  Yosvany López,et al.  Development and validation of a new high-throughput method to investigate the clonality of HTLV-1-infected cells based on provirus integration sites , 2014, Genome Medicine.

[37]  J. Manson,et al.  Common genetic variants of the ion channel transient receptor potential membrane melastatin 6 and 7 (TRPM6 and TRPM7), magnesium intake, and risk of type 2 diabetes in women , 2009, BMC Medical Genetics.

[38]  M. Georges,et al.  Pooled CRISPR Inverse PCR sequencing (PCIP-seq): simultaneous sequencing of retroviral insertion points and the associated provirus in thousands of cells with long reads , 2019, bioRxiv.

[39]  L. Notarangelo,et al.  Lentiviral hematopoietic stem cell gene therapy for X-linked severe combined immunodeficiency , 2016, Science Translational Medicine.

[40]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[41]  Anat Melamed,et al.  The role of HTLV-1 clonality, proviral structure, and genomic integration site in adult T-cell leukemia/lymphoma. , 2014, Blood.