Ligation Bias in Illumina Next-Generation DNA Libraries: Implications for Sequencing Ancient Genomes

Ancient DNA extracts consist of a mixture of endogenous molecules and contaminant DNA templates, often originating from environmental microbes. These two populations of templates exhibit different chemical characteristics, with the former showing depurination and cytosine deamination by-products, resulting from post-mortem DNA damage. Such chemical modifications can interfere with the molecular tools used for building second-generation DNA libraries, and limit our ability to fully characterize the true complexity of ancient DNA extracts. In this study, we first use fresh DNA extracts to demonstrate that library preparation based on adapter ligation at AT-overhangs are biased against DNA templates starting with thymine residues, contrarily to blunt-end adapter ligation. We observe the same bias on fresh DNA extracts sheared on Bioruptor, Covaris and nebulizers. This contradicts previous reports suggesting that this bias could originate from the methods used for shearing DNA. This also suggests that AT-overhang adapter ligation efficiency is affected in a sequence-dependent manner and results in an uneven representation of different genomic contexts. We then show how this bias could affect the base composition of ancient DNA libraries prepared following AT-overhang ligation, mainly by limiting the ability to ligate DNA templates starting with thymines and therefore deaminated cytosines. This results in particular nucleotide misincorporation damage patterns, deviating from the signature generally expected for authenticating ancient sequence data. Consequently, we show that models adequate for estimating post-mortem DNA damage levels must be robust to the molecular tools used for building ancient DNA libraries.

[1]  Michael Hofreiter,et al.  Ancient DNA extraction from bones and teeth , 2007, Nature Protocols.

[2]  Stephen R. Quake,et al.  Genome-wide Single-Cell Analysis of Recombination Activity and De Novo Mutation Rates in Human Sperm , 2012, Cell.

[3]  Yong Wang,et al.  An Aboriginal Australian Genome Reveals Separate Human Dispersals into Asia , 2011, Science.

[4]  J. Metcalf,et al.  Revising the recent evolutionary history of equids using ancient DNA , 2009, Proceedings of the National Academy of Sciences.

[5]  Charlotte L. Oskam,et al.  The half-life of DNA in bone: measuring decay kinetics in 158 dated fossils , 2012, Proceedings of the Royal Society B: Biological Sciences.

[6]  L. Orlando,et al.  Next-generation sequencing offers new insights into DNA degradation. , 2012, Trends in biotechnology.

[7]  Philip L. F. Johnson,et al.  Targeted Investigation of the Neandertal Genome by Array-Based Sequence Capture , 2010, Science.

[8]  Jesse Dabney,et al.  Length and GC-biases during sequencing library amplification: a comparison of various polymerase-buffer systems with ancient and modern DNA sequencing libraries. , 2012, BioTechniques.

[9]  J. Poulain,et al.  Coprolites as a source of information on the genome and diet of the cave hyena , 2012, Proceedings of the Royal Society B: Biological Sciences.

[10]  Martin Kircher,et al.  Addressing challenges in the production and analysis of illumina sequencing data , 2011, BMC Genomics.

[11]  P. Kapranov,et al.  True single-molecule DNA sequencing of a pleistocene horse bone. , 2011, Genome research.

[12]  C. Wiuf,et al.  Statistical evidence for miscoding lesions in ancient DNA templates. , 2001, Molecular biology and evolution.

[13]  Philip L. F. Johnson,et al.  Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse , 2013, Nature.

[14]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[15]  Martin Kircher,et al.  Improved base calling for the Illumina Genome Analyzer using machine learning strategies , 2009, Genome Biology.

[16]  Matthias Meyer,et al.  A draft genome of Yersinia pestis from victims of the Black Death , 2011, Nature.

[17]  Philip L. F. Johnson,et al.  Genetic history of an archaic hominin group from Denisova Cave in Siberia , 2010, Nature.

[18]  L. Orlando,et al.  Mitochondrial Phylogenomics of Modern and Ancient Equids , 2013, PloS one.

[19]  Stinus Lindgreen,et al.  AdapterRemoval: easy cleaning of next-generation sequencing reads , 2012, BMC Research Notes.

[20]  Qiaomei Fu,et al.  The complete mitochondrial DNA genome of an unknown hominin from southern Siberia , 2010, Nature.

[21]  Tianjiao Chu,et al.  Noninvasive prenatal diagnosis of a fetal microdeletion syndrome. , 2011, The New England journal of medicine.

[22]  Johnf . Thompson,et al.  Improving the performance of true single molecule sequencing for ancient DNA , 2012, BMC Genomics.

[23]  Philip L. F. Johnson,et al.  mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters , 2013, Bioinform..

[24]  Eric S. Lander,et al.  Sequencing the nuclear genome of the extinct woolly mammoth , 2008, Nature.

[25]  L. Orlando,et al.  Morphological Convergence in Hippidion and Equus (Amerhippus) South American Equids Elucidated by Ancient DNA Analysis , 2003, Journal of Molecular Evolution.

[26]  Matthias Meyer,et al.  Illumina sequencing library preparation for highly multiplexed target capture and sequencing. , 2010, Cold Spring Harbor protocols.

[27]  A. Wilson,et al.  DNA sequences from the quagga, an extinct member of the horse family , 1984, Nature.

[28]  Ludovic Antoine Alexandre,et al.  Improving ancient DNA read mapping against modern reference genomes , 2015 .

[29]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[30]  Adrian W. Briggs,et al.  Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA , 2009, Nucleic acids research.

[31]  M. Jakobsson,et al.  Origins and Genetic Legacy of Neolithic Farmers and Hunter-Gatherers in Europe , 2012, Science.

[32]  Natalie M. Myres,et al.  New insights into the Tyrolean Iceman's origin and phenotype as inferred by whole-genome sequencing , 2012, Nature Communications.

[33]  Philip L. F. Johnson,et al.  A Draft Sequence of the Neandertal Genome , 2010, Science.

[34]  Federico Sánchez-Quinto,et al.  Fragmentation of Contaminant and Endogenous DNA in Ancient Samples Determined by Shotgun Sequencing; Prospects for Human Palaeogenomics , 2011, PloS one.

[35]  Svante Pääbo,et al.  Mitochondrial genomes reveal an explosive radiation of extinct and extant bears near the Miocene-Pliocene boundary , 2008, BMC Evolutionary Biology.

[36]  Adrian W. Briggs,et al.  A High-Coverage Genome Sequence from an Archaic Denisovan Individual , 2012, Science.

[37]  David J D Earn,et al.  Targeted enrichment of ancient pathogens yielding the pPCP1 plasmid of Yersinia pestis from victims of the Black Death , 2011, Proceedings of the National Academy of Sciences.

[38]  K. Hansen,et al.  Biases in Illumina transcriptome sequencing caused by random hexamer priming , 2010, Nucleic acids research.

[39]  S. Pääbo,et al.  No Evidence of Neandertal mtDNA Contribution to Early Modern Humans , 2004, PLoS biology.

[40]  A. Krogh,et al.  Ancient human genome sequence of an extinct Palaeo-Eskimo , 2010, Nature.

[41]  Svante Pääbo,et al.  Temporal Patterns of Nucleotide Misincorporations and DNA Fragmentation in Ancient DNA , 2012, PloS one.

[42]  Alexander F. Auch,et al.  Metagenomics to Paleogenomics: Large-Scale Sequencing of Mammoth DNA , 2006, Science.

[43]  Adrian W. Briggs,et al.  Preparation of next-generation sequencing libraries from damaged DNA. , 2012, Methods in molecular biology.

[44]  A. von Haeseler,et al.  DNA sequences from multiple amplifications reveal artifacts induced by cytosine deamination in ancient DNA. , 2001, Nucleic acids research.

[45]  E. Willerslev,et al.  Damage and repair of ancient DNA. , 2005, Mutation research.

[46]  Philip L. F. Johnson,et al.  Patterns of damage in genomic DNA sequences from a Neandertal , 2007, Proceedings of the National Academy of Sciences.

[47]  James Haile,et al.  Fossil avian eggshell preserves ancient DNA , 2010, Proceedings of the Royal Society B: Biological Sciences.

[48]  Edward M. Rubin,et al.  Genomic Sequencing of Pleistocene Cave Bears , 2005, Science.

[49]  Richard Durbin,et al.  A large genome center's improvements to the Illumina sequencing system , 2008, Nature Methods.

[50]  M. Thomas P. Gilbert,et al.  mapDamage: testing for damage patterns in ancient DNA sequences , 2011, Bioinform..

[51]  Y. Benjamini,et al.  Summarizing and correcting the GC content bias in high-throughput sequencing , 2012, Nucleic acids research.