Detecting rare structural variation in evolving microbial populations from new sequence junctions using breseq

New mutations leading to structural variation (SV) in genomes—in the form of mobile element insertions, large deletions, gene duplications, and other chromosomal rearrangements—can play a key role in microbial evolution. Yet, SV is considerably more difficult to predict from short-read genome resequencing data than single-nucleotide substitutions and indels (SN), so it is not yet routinely identified in studies that profile population-level genetic diversity over time in evolution experiments. We implemented an algorithm for detecting polymorphic SV as part of the breseq computational pipeline. This procedure examines split-read alignments, in which the two ends of a single sequencing read match disjoint locations in the reference genome, in order to detect structural variants and estimate their frequencies within a sample. We tested our algorithm using simulated Escherichia coli data and then applied it to 500- and 1000-generation population samples from the Lenski E. coli long-term evolution experiment (LTEE). Knowledge of genes that are targets of selection in the LTEE and mutations present in previously analyzed clonal isolates allowed us to evaluate the accuracy of our procedure. Overall, SV accounted for ~25% of the genetic diversity found in these samples. By profiling rare SV, we were able to identify many cases where alternative mutations in key genes transiently competed within a single population. We also found, unexpectedly, that mutations in two genes that rose to prominence at these early time points always went extinct in the long term. Because it is not limited by the base-calling error rate of the sequencing technology, our approach for identifying rare SV in whole-population samples may have a lower detection limit than similar predictions of SNs in these data sets. We anticipate that this functionality of breseq will be useful for providing a more complete picture of genome dynamics during evolution experiments with haploid microorganisms.

[1]  S. Salzberg,et al.  TopHat-Fusion: an algorithm for discovery of novel fusion transcripts , 2011, Genome Biology.

[2]  Richard E. Lenski,et al.  Mechanisms Causing Rapid and Parallel Losses of Ribose Catabolism in Evolving Populations of Escherichia coli B , 2001, Journal of bacteriology.

[3]  R. Lenski,et al.  Evolution of Penicillin-Binding Protein 2 Concentration and Cell Shape during a Long-Term Experiment with Escherichia coli , 2008, Journal of bacteriology.

[4]  R. Lynch Genomics of Adaptation and Diversification , 2015 .

[5]  C. Rock,et al.  Transcriptional Regulation of Membrane Lipid Homeostasis in Escherichia coli* , 2009, The Journal of Biological Chemistry.

[6]  Ronald C. Taylor,et al.  The Highly Conserved MraZ Protein Is a Transcriptional Regulator in Escherichia coli , 2014, Journal of bacteriology.

[7]  Daniel E. Deatherage,et al.  Recursive genomewide recombination and sequencing reveals a key refinement step in the evolution of a metabolic innovation in Escherichia coli , 2013, Proceedings of the National Academy of Sciences.

[8]  David Botstein,et al.  Characteristic genome rearrangements in experimental evolution of Saccharomyces cerevisiae , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Faraz Hach,et al.  Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery , 2010, Bioinform..

[10]  Michael M. Desai,et al.  Pervasive Genetic Hitchhiking and Clonal Interference in 40 Evolving Yeast Populations , 2013, Nature.

[11]  G. Perrière,et al.  The source of laterally transferred genes in bacterial genomes , 2003, Genome Biology.

[12]  Emmanuel Barillot,et al.  SVDetect: a tool to identify genomic structural variations from paired-end and mate-pair sequencing data , 2010, Bioinform..

[13]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[14]  R. Lenski,et al.  Evolution of global regulatory networks during a long‐term experiment with Escherichia coli , 2007, BioEssays : news and reviews in molecular, cellular and developmental biology.

[15]  Jesse J. Salk,et al.  Detection of ultra-rare mutations by next-generation sequencing , 2012, Proceedings of the National Academy of Sciences.

[16]  Richard E. Lenski,et al.  Mutation Rate Inferred From Synonymous Substitutions in a Long-Term Evolution Experiment With Escherichia coli , 2011, G3: Genes | Genomes | Genetics.

[17]  R. Lenski,et al.  Long-Term Experimental Evolution in Escherichia coli. I. Adaptation and Divergence During 2,000 Generations , 1991, The American Naturalist.

[18]  C. Woodman,et al.  The natural history of cervical HPV infection: unresolved issues , 2007, Nature Reviews Cancer.

[19]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[20]  Jeffrey E. Barrick,et al.  Second-Order Selection for Evolvability in a Large Escherichia coli Population , 2011, Science.

[21]  David Z. Chen,et al.  METHOD Open Access , 2014 .

[22]  Nigel F. Delaney,et al.  FREQ-Seq: A Rapid, Cost-Effective, Sequencing-Based Method to Determine Allele Frequencies Directly from Mixed Populations , 2012, PloS one.

[23]  Nathan E Lewis,et al.  Microbial laboratory evolution in the era of genome-scale science , 2011, Molecular systems biology.

[24]  Jeffrey E. Barrick,et al.  Large Chromosomal Rearrangements during a Long-Term Evolution Experiment with Escherichia coli , 2014, mBio.

[25]  Lovelace J. Luquette,et al.  Diverse Mechanisms of Somatic Structural Variations in Human Cancer Genomes , 2013, Cell.

[26]  A. F. Bennett,et al.  The Molecular Diversity of Adaptive Convergence , 2012, Science.

[27]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[28]  R. Lenski,et al.  Historical contingency and the evolution of a key innovation in an experimental population of Escherichia coli , 2008 .

[29]  S. Takagi,et al.  Natural History , 2019, Nature.

[30]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[31]  R. Lenski,et al.  Genome sequences of Escherichia coli B strains REL606 and BL21(DE3). , 2009, Journal of molecular biology.

[32]  Jeffrey E. Barrick,et al.  Genome-wide mutational diversity in an evolving population of Escherichia coli. , 2009, Cold Spring Harbor symposia on quantitative biology.

[33]  Jeffrey E. Barrick,et al.  Genome dynamics during experimental evolution , 2013, Nature Reviews Genetics.

[34]  David Botstein,et al.  The Repertoire and Dynamics of Evolutionary Adaptations to Controlled Nutrient-Limited Environments in Yeast , 2008, PLoS genetics.

[35]  John M. Walker,et al.  Myeloid Leukemia , 2020, Methods In Molecular Medicine™.

[36]  Michael Doebeli,et al.  Parallel Evolutionary Dynamics of Adaptive Diversification in Escherichia coli , 2013, PLoS biology.

[37]  Jeffrey E. Barrick,et al.  Genome evolution and adaptation in a long-term experiment with Escherichia coli , 2009, Nature.

[38]  R. Lenski,et al.  Negative Epistasis Between Beneficial Mutations in an Evolving Bacterial Population , 2011, Science.

[39]  V. Cooper,et al.  Tangled bank of experimentally evolved Burkholderia biofilms reflects selection during chronic infections , 2012, Proceedings of the National Academy of Sciences.

[40]  Patrick D. Schloss,et al.  Reducing the Effects of PCR Amplification and Sequencing Artifacts on 16S rRNA-Based Studies , 2011, PloS one.

[41]  Dominique Schneider,et al.  Tests of parallel molecular evolution in a long-term experiment with Escherichia coli. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[42]  D. Bhattacharya,et al.  Faculty Opinions recommendation of Genomic analysis of a key innovation in an experimental Escherichia coli population. , 2012 .

[43]  Richard E. Lenski,et al.  Long-Term Experimental Evolution in Escherichia coli. XIII. Phylogenetic History of a Balanced Polymorphism , 2005, Journal of Molecular Evolution.

[44]  Michael J. Wiser,et al.  Mutation rate dynamics in a bacterial population reflect tension between adaptation and genetic load , 2012, Proceedings of the National Academy of Sciences.

[45]  R. Lenski,et al.  Dynamics of adaptation and diversification: a 10,000-generation experiment with bacterial populations. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[46]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[47]  Manuel Holtgrewe,et al.  Mason – A Read Simulator for Second Generation Sequencing Data , 2010 .

[48]  Fang Fang,et al.  FusionMap: detecting fusion genes from next-generation sequencing data at base-pair resolution , 2011, Bioinform..

[49]  Ales Vancura,et al.  Transcriptional Regulation , 2012, Methods in Molecular Biology.

[50]  T. Fennell,et al.  Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries , 2011, Genome Biology.

[51]  Jeffrey E. Barrick,et al.  Identification of mutations in laboratory-evolved microbes from next-generation sequencing data using breseq. , 2014, Methods in molecular biology.

[52]  R. Wilson,et al.  BreakDancer: An algorithm for high resolution mapping of genomic structural variation , 2009, Nature Methods.

[53]  J. Melo,et al.  The molecular biology of chronic myeloid leukemia. , 2000, Blood.

[54]  D. Sobral,et al.  The First Steps of Adaptation of Escherichia coli to the Gut Are Dominated by Soft Sweeps , 2013, PLoS genetics.

[55]  R. Lenski,et al.  Identification and dynamics of a beneficial mutation in a long-term evolution experiment with Escherichia coli , 2009, BMC Evolutionary Biology.

[56]  Jeffrey A. Hussmann,et al.  High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing , 2013, Proceedings of the National Academy of Sciences.

[57]  David B. Knoester,et al.  Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq , 2014, BMC Genomics.