MLV integration site selection is driven by strong enhancers and active promoters

Retroviruses integrate into the host genome in patterns specific to each virus. Understanding the causes of these patterns can provide insight into viral integration mechanisms, pathology and genome evolution, and is critical to the development of safe gene therapy vectors. We generated murine leukemia virus integrations in human HepG2 and K562 cells and subjected them to second-generation sequencing, using a DNA barcoding technique that allowed us to quantify independent integration events. We characterized >3 700 000 unique integration events in two ENCODE-characterized cell lines. We find that integrations were most highly enriched in a subset of strong enhancers and active promoters. In both cell types, approximately half the integrations were found in <2% of the genome, demonstrating genomic influences even narrower than previously believed. The integration pattern of murine leukemia virus appears to be largely driven by regions that have high enrichment for multiple marks of active chromatin; the combination of histone marks present was sufficient to explain why some strong enhancers were more prone to integration than others. The approach we used is applicable to analyzing the integration pattern of any exogenous element and could be a valuable preclinical screen to evaluate the safety of gene therapy vectors.

[1]  D. Mager,et al.  Endogenous retroviral LTRs as promoters for human genes: a critical assessment. , 2009, Gene.

[2]  F. Bushman,et al.  BET proteins promote efficient murine leukemia virus integration at transcription start sites , 2013, Proceedings of the National Academy of Sciences.

[3]  F. Bushman,et al.  HIV integration site selection: analysis by massively parallel pyrosequencing reveals association with epigenetic modifications. , 2007, Genome research.

[4]  A. Schambach,et al.  Bromo- and Extraterminal Domain Chromatin Regulators Serve as Cofactors for Murine Leukemia Virus Integration , 2013, Journal of Virology.

[5]  D. Haussler,et al.  Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53 , 2007, Proceedings of the National Academy of Sciences.

[6]  Christof von Kalle,et al.  Distinct Genomic Integration of MLV and SIV Vectors in Primate Hematopoietic Stem and Progenitor Cells , 2004, PLoS biology.

[7]  S. Gribble,et al.  Cytogenetics of the chronic myeloid leukemia-derived cell line K562: karyotype clarification by multicolor fluorescence in situ hybridization, comparative genomic hybridization, and locus-specific fluorescence in situ hybridization. , 2000, Cancer genetics and cytogenetics.

[8]  Shoshannah L. Roth,et al.  A method to sequence and quantify DNA integration for monitoring outcome in gene therapy , 2011, Nucleic acids research.

[9]  Mary Goldman,et al.  The UCSC Genome Browser database: extensions and updates 2013 , 2012, Nucleic Acids Res..

[10]  M. Daly,et al.  Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). , 2005, Genome research.

[11]  F. Bushman,et al.  The BET family of proteins targets moloney murine leukemia virus integration near transcription start sites. , 2013, Cell reports.

[12]  S. Rosenberg,et al.  T Lymphocyte-Directed Gene Therapy for ADA− SCID: Initial Trial Results After 4 Years , 1995, Science.

[13]  Mary Goldman,et al.  The UCSC Genome Browser database: extensions and updates 2011 , 2011, Nucleic Acids Res..

[14]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[15]  P Gruss,et al.  Characterization of enhancer elements in the long terminal repeat of Moloney murine sarcoma virus , 1984, Journal of virology.

[16]  Frederic D. Bushman,et al.  Gammaretroviral Integration into Nucleosomal Target DNA In Vivo , 2011, Journal of Virology.

[17]  Paul Shinn,et al.  HIV-1 Integration in the Human Genome Favors Active Genes and Local Hotspots , 2002, Cell.

[18]  Raymond K. Auerbach,et al.  Extensive Promoter-Centered Chromatin Interactions Provide a Topological Basis for Transcription Regulation , 2012, Cell.

[19]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[20]  Frederic D. Bushman,et al.  Dynamics of gene-modified progenitor cells analyzed by tracking retroviral integration sites in a human SCID-X1 gene therapy trial. , 2010, Blood.

[21]  Ting Wang,et al.  ENCODE whole-genome data in the UCSC Genome Browser , 2009, Nucleic Acids Res..

[22]  J. Blomberg,et al.  Evolution of human endogenous retroviral sequences: a conceptual account. , 2008, Cellular and molecular life sciences : CMLS.

[23]  P. Tolstoshev Retroviral-mediated gene therapy--safety considerations and preclinical studies. , 1992, Bone marrow transplantation.

[24]  N. Wong,et al.  A comprehensive karyotypic study on human hepatocellular carcinoma by spectral karyotyping , 2000, Hepatology.

[25]  F. Bushman,et al.  Insertional oncogenesis in 4 patients after retrovirus-mediated gene therapy of SCID-X1. , 2008, The Journal of clinical investigation.

[26]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[27]  Sridhar Hannenhalli,et al.  Selection of Target Sites for Mobile DNA Integration in the Human Genome , 2006, PLoS Comput. Biol..

[28]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[29]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[30]  C. L. Li,et al.  Gammaretroviral vector integration occurs overwhelmingly within and near DNase hypersensitive sites. , 2012, Human gene therapy.

[31]  Shawn M. Burgess,et al.  Transcription Start Regions in the Human Genome Are Favored Targets for MLV Integration , 2003, Science.

[32]  Hans-Peter Kiem,et al.  Foamy virus vector integration sites in normal human cells , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[34]  S. Burgess,et al.  Weak Palindromic Consensus Sequences Are a Common Feature Found at the Integration Target Sites of Many Retroviruses , 2005, Journal of Virology.

[35]  Federico Andrea Santoni,et al.  Deciphering the Code for Retroviral Integration Target Site Selection , 2010, PLoS Comput. Biol..

[36]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[37]  S. Burgess,et al.  Using retroviruses as a mutagenesis tool to explore the zebrafish genome. , 2008, Briefings in functional genomics & proteomics.

[38]  L. N. van de Lagemaat,et al.  Retroelement distributions in the human genome: variations associated with age and proximity to genes. , 2002, Genome research.

[39]  Timothy J. Durham,et al.  "Systematic" , 1966, Comput. J..

[40]  Alessandro Guffanti,et al.  High-definition mapping of retroviral integration sites identifies active regulatory elements in human multipotent hematopoietic progenitors. , 2010, Blood.

[41]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[42]  Matteo Pellegrini,et al.  A large-scale zebrafish gene knockout resource for the genome-wide study of gene function , 2013, Genome research.

[43]  Christine Kinnon,et al.  Insertional mutagenesis combined with acquired somatic mutations causes leukemogenesis following gene therapy of SCID-X1 patients. , 2008, The Journal of clinical investigation.

[44]  Aaron R. Quinlan,et al.  BamTools: a C++ API and toolkit for analyzing and managing BAM files , 2011, Bioinform..

[45]  T. Mikkelsen,et al.  Genome-wide maps of chromatin state in pluripotent and lineage-committed cells , 2007, Nature.

[46]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .