Wide spectrum and high frequency of genomic structural variation, including transposable elements, in large double-stranded DNA viruses

Abstract Our knowledge of the diversity and frequency of genomic structural variation segregating in populations of large double-stranded (ds) DNA viruses is limited. Here, we sequenced the genome of a baculovirus (Autographa californica multiple nucleopolyhedrovirus [AcMNPV]) purified from beet armyworm (Spodoptera exigua) larvae at depths >195,000× using both short- (Illumina) and long-read (PacBio) technologies. Using a pipeline relying on hierarchical clustering of structural variants (SVs) detected in individual short- and long-reads by six variant callers, we identified a total of 1,141 SVs in AcMNPV, including 464 deletions, 443 inversions, 160 duplications, and 74 insertions. These variants are considered robust and unlikely to result from technical artifacts because they were independently detected in at least three long reads as well as at least three short reads. SVs are distributed along the entire AcMNPV genome and may involve large genomic regions (30,496 bp on average). We show that no less than 39.9 per cent of genomes carry at least one SV in AcMNPV populations, that the vast majority of SVs (75%) segregate at very low frequency (<0.01%) and that very few SVs persist after ten replication cycles, consistent with a negative impact of most SVs on AcMNPV fitness. Using short-read sequencing datasets, we then show that populations of two iridoviruses and one herpesvirus are also full of SVs, as they contain between 426 and 1,102 SVs carried by 52.4–80.1 per cent of genomes. Finally, AcMNPV long reads allowed us to identify 1,757 transposable elements (TEs) insertions, 895 of which are truncated and occur at one extremity of the reads. This further supports the role of baculoviruses as possible vectors of horizontal transfer of TEs. Altogether, we found that SVs, which evolve mostly under rapid dynamics of gain and loss in viral populations, represent an important feature in the biology of large dsDNA viruses.

[1]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[2]  M. Fukaya,et al.  A Chilo Iridescent Virus (CIV) from the Rice Stem Borer, Chilo suppressalis WALKER (Lepidoptera : Pyralidae) , 1966 .

[3]  E. Huang,et al.  Human Cytomegalovirus Genome: Partial Denaturation Map and Organization of Genome Sequences , 1977, Journal of virology.

[4]  David W. Miller,et al.  A virus mutant with an insertion of a copia-like transposable element , 1982, Nature.

[5]  H. Ackermann,et al.  A morphological investigation of 23 baculoviruses , 1983 .

[6]  P. Kitts,et al.  Linearization of baculovirus DNA enhances the recovery of recombinant virus expression vectors. , 1990, Nucleic acids research.

[7]  M. Kool,et al.  Detection and analysis of Autographa californica nuclear polyhedrosis virus mutants with defective interfering properties. , 1991, Virology.

[8]  C. D. de Gooijer,et al.  A structured dynamic model for the baculovirus infection process in insect‐cell reactor configurations , 1992, Biotechnology and bioengineering.

[9]  G. D. Pearson,et al.  The Autographa californica baculovirus genome: evidence for multiple replication origins. , 1992, Science.

[10]  D. O'reilly,et al.  Baculovirus expression vectors: a laboratory manual. , 1992 .

[11]  H. G. Wang,et al.  Assay for movement of Lepidopteran transposon IFP2 in insect cells using a baculovirus genome as a target DNA. , 1995, Virology.

[12]  H. Backhaus,et al.  TCl4.7: a novel lepidopteran transposon found in Cydia pomonella granulosis virus. , 1995, Virology.

[13]  M. Kool,et al.  Replication of baculovirus DNA. , 1995, The Journal of general virology.

[14]  A model of Nucleopolyhedrovirus (NPV) population genetics applied to co–occlusion and the spread of the few Polyhedra (FP) phenotype , 1997, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[15]  J. Vlak,et al.  Identification and sequence analysis of the integration site of transposon TCp3.2 in the genome of Cydia pomonella granulovirus. , 1997, Virus research.

[16]  A. Passarelli,et al.  Genetic Requirements for Homologous Recombination in Autographa californica Nucleopolyhedrovirus , 2002, Journal of Virology.

[17]  T. Williams,et al.  Defective or effective? Mutualistic interactions between virus genotypes , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[18]  H. Godfray,et al.  A Few-Polyhedra Mutant and Wild-Type Nucleopolyhedrovirus Remain as a Stable Polymorphism during Serial Coinfection in Trichoplusia ni , 2003, Applied and Environmental Microbiology.

[19]  A. Hughes,et al.  Genome-wide survey for genes horizontally transferred from cellular organisms to baculoviruses. , 2003, Molecular biology and evolution.

[20]  B. Hammock,et al.  High-Frequency Homologous Recombination betweenBaculoviruses Involves DNAReplication , 2003, Journal of Virology.

[21]  T. Williams,et al.  Dynamics of deletion genotypes in an experimental insect virus population , 2006, Proceedings of the Royal Society B: Biological Sciences.

[22]  Vasily Tcherepanov,et al.  Genome Annotation Transfer Utility (GATU): rapid annotation of viral genomes using a closely related reference genome , 2006, BMC Genomics.

[23]  B. Arif,et al.  The Baculoviruses Occlusion‐Derived Virus: Virion Structure and Function , 2006, Advances in Virus Research.

[24]  Mihai Pop,et al.  Minimus: a fast, lightweight genome assembler , 2007, BMC Bioinformatics.

[25]  J. Vlak,et al.  Baculovirus genomics. , 2007, Current drug targets.

[26]  B. Bonning,et al.  Protocols for Oral Infection of Lepidopteran Larvae with Baculovirus , 2008, Journal of visualized experiments : JoVE.

[27]  E. Holmes,et al.  Rates of evolutionary change in viruses: patterns and determinants , 2008, Nature Reviews Genetics.

[28]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[29]  Kai Ye,et al.  Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads , 2009, Bioinform..

[30]  N. Dimmock,et al.  Defective interfering viruses and their potential as antiviral agents , 2010, Reviews in medical virology.

[31]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[32]  Thomas M. Keane,et al.  Enhanced structural variant and breakpoint detection using SVMerge by integration of multiple detection methods and local assembly , 2010, Genome Biology.

[33]  Slave Trajanoski,et al.  The impact of PCR-generated recombination on diversity estimation of mixed viral populations by deep sequencing. , 2010, Journal of virological methods.

[34]  Raul Andino,et al.  Quasispecies Theory and the Behavior of RNA Viruses , 2010, PLoS pathogens.

[35]  E. Fortunato,et al.  Stimulation of Homology-Directed Repair at I-SceI-Induced DNA Breaks during the Permissive Life Cycle of Human Cytomegalovirus , 2011, Journal of Virology.

[36]  Florent E. Angly,et al.  Next Generation Sequence Assembly with AMOS , 2011, Current protocols in bioinformatics.

[37]  William B. Lott,et al.  Defective Interfering Viral Particles in Acute Dengue Infections , 2011, PloS one.

[38]  Bradley P. Coe,et al.  Genome structural variation discovery and genotyping , 2011, Nature Reviews Genetics.

[39]  Florent E. Angly,et al.  Grinder: a versatile amplicon and shotgun sequence simulator , 2012, Nucleic acids research.

[40]  Thomas Zichner,et al.  DELLY: structural variant discovery by integrated paired-end and split-read analysis , 2012, Bioinform..

[41]  Jay Shendure,et al.  Poxviruses Deploy Genomic Accordions to Adapt Rapidly against Host Antiviral Defenses , 2012, Cell.

[42]  S. Wain-Hobson,et al.  Structural Variability of the Herpes Simplex Virus 1 Genome In Vitro and In Vivo , 2012, Journal of Virology.

[43]  Dennis A. Benson,et al.  GenBank , 2012, Nucleic Acids Res..

[44]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[45]  Emmanuel F. A. Toussaint,et al.  Palaeoenvironmental Shifts Drove the Adaptive Radiation of a Noctuid Stemborer Tribe (Lepidoptera, Noctuidae, Apameini) in the Miocene , 2012, PloS one.

[46]  Glenn Tesler,et al.  Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory , 2012, BMC Bioinformatics.

[47]  Daniel Müllner,et al.  fastcluster: Fast Hierarchical, Agglomerative Clustering Routines for R and Python , 2013 .

[48]  Moriah L. Szpara,et al.  Evolution and Diversity in Human Herpes Simplex Virus Genomes , 2013, Journal of Virology.

[49]  Kiyoshi Asai,et al.  PBSIM: PacBio reads simulator - toward accurate genome assembly , 2013, Bioinform..

[50]  J. Filée,et al.  Route of NCLDV evolution: the genomic accordion. , 2013, Current opinion in virology.

[51]  J. Jensen,et al.  Rapid Intrahost Evolution of Human Cytomegalovirus Is Shaped by Demography and Positive Selection , 2013, PLoS genetics.

[52]  E. Paccagnini,et al.  Iridovirus infection in terrestrial isopods from Sicily (Italy). , 2013, Tissue & cell.

[53]  Raul Andino,et al.  The role of mutational robustness in RNA virus evolution , 2013, Nature Reviews Microbiology.

[54]  R. Fontana,et al.  Minimum-Size Mixed-Level Orthogonal Fractional Factorial Designs Generation: A SAS-Based Algorithm , 2013 .

[55]  Ryan M. Layer,et al.  LUMPY: a probabilistic framework for structural variant discovery , 2012, Genome Biology.

[56]  S. Elena,et al.  Model-Selection-Based Approach for Calculating Cellular Multiplicity of Infection during Virus Colonization of Multi-Cellular Hosts , 2013, PloS one.

[57]  Baculovirus nucleocapsid aggregation (MNPV vs SNPV): an evolutionary strategy, or a product of replication conditions? , 2014, Virus Genes.

[58]  R. Andino,et al.  Library preparation for highly accurate population sequencing of RNA viruses , 2014, Nature Protocols.

[59]  T. Kowalik,et al.  The DNA Damage Response Induced by Infection with Human Cytomegalovirus and Other Viruses , 2014, Viruses.

[60]  C. Fraser,et al.  Single molecule sequencing and genome assembly of a clinical specimen of Loa loa, the causative agent of loiasis , 2014, BMC Genomics.

[61]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[62]  Christina A. Cuomo,et al.  Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement , 2014, PloS one.

[63]  Adam C. English,et al.  PBHoney: identifying genomic variants via long-read discordance and interrupted mapping , 2014, BMC Bioinformatics.

[64]  Raul Andino,et al.  Mutational and fitness landscapes of an RNA virus revealed through population sequencing , 2013, Nature.

[65]  C. Cruaud,et al.  Genome sequence of a crustacean iridovirus, IIV31, isolated from the pill bug, Armadillidium vulgare. , 2014, The Journal of general virology.

[66]  Martin Hunt,et al.  Summarizing Specific Profiles in Illumina Sequencing from Whole-Genome Amplified DNA , 2013, DNA research : an international journal for rapid publication of reports on genes and genomes.

[67]  C. López Defective Viral Genomes: Critical Danger Signals of Viral Infections , 2014, Journal of Virology.

[68]  D. Jarvis,et al.  Complete Genome Sequence of the Autographa californica Multiple Nucleopolyhedrovirus Strain E2 , 2014, Genome Announcements.

[69]  E. Herniou,et al.  Population genomics supports baculoviruses as vectors of horizontal transfer of insect transposons , 2014, Nature Communications.

[70]  W. Britt,et al.  Limits and patterns of cytomegalovirus genomic diversity in humans , 2015, Proceedings of the National Academy of Sciences.

[71]  Mark Yandell,et al.  Wham: Identifying Structural Variants of Biological Consequence , 2015, PLoS Comput. Biol..

[72]  E. Herniou,et al.  Ultra Deep Sequencing of a Baculovirus Population Reveals Widespread Genomic Variations , 2015, Viruses.

[73]  Mark Gerstein,et al.  MetaSV: an accurate and integrative structural-variant caller for next generation sequencing , 2015, Bioinform..

[74]  E. Herniou,et al.  Gene Acquisition Convergence between Entomopoxviruses and Baculoviruses , 2015, Viruses.

[75]  Heng Li,et al.  FermiKit: assembly-based variant calling for Illumina resequencing data , 2015, Bioinform..

[76]  O. Kohany,et al.  Repbase Update, a database of repetitive elements in eukaryotic genomes , 2015, Mobile DNA.

[77]  Guangdi Li,et al.  High-Throughput Analysis of Human Cytomegalovirus Genome Diversity Highlights the Widespread Occurrence of Gene-Disrupting Mutations and Pervasive Recombination , 2015, Journal of Virology.

[78]  J. Filée Genomic comparison of closely related Giant Viruses supports an accordion-like model of evolution , 2015, Front. Microbiol..

[79]  John E. Johnson,et al.  ClickSeq: Fragmentation-Free Next-Generation Sequencing via Click Ligation of Adaptors to Stochastically Terminated 3'-Azido cDNAs. , 2015, Journal of molecular biology.

[80]  J. Jensen,et al.  On the Analysis of Intrahost and Interhost Viral Populations: Human Cytomegalovirus as a Case Study of Pitfalls and Expectations , 2016, Journal of Virology.

[81]  S. Neumann,et al.  Natural variation of root exudates in Arabidopsis thaliana-linking metabolomic and genomic data , 2016, Scientific Reports.

[82]  Wolfgang Losert,et al.  svclassify: a method to establish benchmark structural variant calls , 2015, BMC Genomics.

[83]  Xiaoyu Chen,et al.  Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications , 2016, Bioinform..

[84]  Sven Rahmann,et al.  SimLoRD: Simulation of Long Read Data , 2016, Bioinform..

[85]  B. Moumen,et al.  Continuous Influx of Genetic Material from Host to Virus Populations , 2016, PLoS genetics.

[86]  Aris Katzourakis,et al.  De Novo Assembly of Human Herpes Virus Type 1 (HHV-1) Genome, Mining of Non-Canonical Structures and Detection of Novel Drug-Resistance Mutations Using Short- and Long-Read Next Generation Sequencing Technologies , 2016, PloS one.

[87]  Rafael Sanjuán,et al.  Mechanisms of viral mutation , 2016, Cellular and Molecular Life Sciences.

[88]  A. Routh,et al.  Parallel ClickSeq and Nanopore sequencing elucidates the rapid evolution of defective-interfering RNAs in Flock House virus , 2017, PLoS pathogens.

[89]  S. Koren,et al.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation , 2016, bioRxiv.

[90]  J. Peccoud,et al.  Massive horizontal transfer of transposable elements in insects , 2017, Proceedings of the National Academy of Sciences.

[91]  Richard Cordaux,et al.  Viruses as vectors of horizontal transfer of genetic material in eukaryotes. , 2017, Current opinion in virology.

[92]  J. Oliveros,et al.  Reduced accumulation of defective viral genomes contributes to severe outcome in influenza virus infected patients , 2017, PLoS Pathogens.

[93]  Michael C. Schatz,et al.  Accurate detection of complex structural variations using single molecule sequencing , 2017, Nature Methods.

[94]  J. Li,et al.  Large-scale comparative epigenomics reveals hierarchical regulation of non-CG methylation in Arabidopsis , 2018, Proceedings of the National Academy of Sciences.

[95]  J. Peccoud,et al.  A Survey of Virus Recombination Uncovers Canonical Features of Artificial Chimeras Generated During Deep Sequencing Library Preparation , 2018, G3: Genes, Genomes, Genetics.

[96]  M. Vignuzzi,et al.  Dicer-2-Dependent Generation of Viral DNA from Defective Genomes of RNA Viruses Modulates Antiviral Immunity in Insects , 2018, Cell host & microbe.

[97]  C. López,et al.  Defective (interfering) viral genomes re-explored: impact on antiviral immunity and virus persistence , 2018, Future virology.

[98]  Aaron R Quinlan,et al.  Long read sequencing reveals poxvirus evolution through rapid homogenization of gene arrays , 2018, bioRxiv.

[99]  P. Klenerman,et al.  Nanopore sequencing and full genome de novo assembly of human cytomegalovirus TB40/E reveals clonal diversity and structural variations , 2018, BMC Genomics.

[100]  Shao-Wu Zhang,et al.  NPBSS: a new PacBio sequencing simulator for generating the continuous long reads with an empirical model , 2018, BMC Bioinformatics.

[101]  Goo Jun,et al.  Parliament2: Fast Structural Variant Calling Using Optimized Combinations of Callers , 2018, bioRxiv.

[102]  M. Mehta,et al.  PacBio library preparation using blunt-end adapter ligation produces significant artefactual fusion DNA sequences , 2018, bioRxiv.

[103]  Chaochun Wei,et al.  PaSS: a sequencing simulator for PacBio sequencing , 2019, BMC Bioinformatics.

[104]  J. Peccoud,et al.  Global survey of mobile DNA horizontal transfer in arthropods reveals Lepidoptera as a prime hotspot , 2019, PLoS genetics.

[105]  B. Moumen,et al.  The Genome of Armadillidium vulgare (Crustacea, Isopoda) Provides Insights into Sex Chromosome Evolution in the Context of Cytoplasmic Sex Determination. , 2019, Molecular biology and evolution.

[106]  C. Feschotte,et al.  Host–transposon interactions: conflict, cooperation, and cooption , 2019, Genes & development.

[107]  Moriah L. Szpara,et al.  Genotypic and Phenotypic Diversity of Herpes Simplex Virus 2 within the Infected Neonatal Population , 2019, mSphere.

[108]  Asif U. Tamuri,et al.  Human cytomegalovirus haplotype reconstruction reveals high diversity due to superinfection and evidence of within-host recombination , 2019, Proceedings of the National Academy of Sciences.