Quantifying and reducing spurious alignments for the analysis of ultra-short ancient DNA sequences

BackgroundThe study of ancient DNA is hampered by degradation, resulting in short DNA fragments. Advances in laboratory methods have made it possible to retrieve short DNA fragments, thereby improving access to DNA preserved in highly degraded, ancient material. However, such material contains large amounts of microbial contamination in addition to DNA fragments from the ancient organism. The resulting mixture of sequences constitutes a challenge for computational analysis, since microbial sequences are hard to distinguish from the ancient sequences of interest, especially when they are short.ResultsHere, we develop a method to quantify spurious alignments based on the presence or absence of rare variants. We find that spurious alignments are enriched for mismatches and insertion/deletion differences and lack substitution patterns typical of ancient DNA. The impact of spurious alignments can be reduced by filtering on these features and by imposing a sample-specific minimum length cutoff. We apply this approach to sequences from four ~ 430,000-year-old Sima de los Huesos hominin remains, which contain particularly short DNA fragments, and increase the amount of usable sequence data by 17–150%. This allows us to place a third specimen from the site on the Neandertal lineage.ConclusionsOur method maximizes the sequence data amenable to genetic analysis from highly degraded ancient material and avoids pitfalls that are associated with the analysis of ultra-short DNA sequences.

[1]  Temple F. Smith,et al.  The statistical distribution of nucleic acid similarities. , 1985, Nucleic acids research.

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[4]  Edward M. Rubin,et al.  Genomic Sequencing of Pleistocene Cave Bears , 2005, Science.

[5]  Adrian W. Briggs,et al.  Analysis of one million base pairs of Neanderthal DNA , 2006, Nature.

[6]  Alexander F. Auch,et al.  Metagenomics to Paleogenomics: Large-Scale Sequencing of Mammoth DNA , 2006, Science.

[7]  Philip L. F. Johnson,et al.  Patterns of damage in genomic DNA sequences from a Neandertal , 2007, Proceedings of the National Academy of Sciences.

[8]  Janet Kelso,et al.  Computational challenges in the analysis of ancient DNA , 2010, Genome Biology.

[9]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[10]  Adrian W. Briggs,et al.  The Neandertal genome and ancient DNA authenticity , 2009, The EMBO journal.

[11]  Philip L. F. Johnson,et al.  Genetic history of an archaic hominin group from Denisova Cave in Siberia , 2010, Nature.

[12]  A. Krogh,et al.  Ancient human genome sequence of an extinct Palaeo-Eskimo , 2010, Nature.

[13]  Philip L. F. Johnson,et al.  A Draft Sequence of the Neandertal Genome , 2010, Science.

[14]  Yong Wang,et al.  An Aboriginal Australian Genome Reveals Separate Human Dispersals into Asia , 2011, Science.

[15]  D. Reich,et al.  Denisova admixture and the first modern human dispersals into Southeast Asia and Oceania. , 2011, American journal of human genetics.

[16]  Anders Krogh,et al.  Improving ancient DNA read mapping against modern reference genomes , 2012, BMC Genomics.

[17]  Martin Kircher,et al.  Analysis of high-throughput ancient DNA sequencing data. , 2012, Methods in molecular biology.

[18]  Natalie M. Myres,et al.  New insights into the Tyrolean Iceman's origin and phenotype as inferred by whole-genome sequencing , 2012, Nature Communications.

[19]  Jesse Dabney,et al.  Length and GC-biases during sequencing library amplification: a comparison of various polymerase-buffer systems with ancient and modern DNA sequencing libraries. , 2012, BioTechniques.

[20]  Roderic Guigó,et al.  The GEM mapper: fast, accurate and versatile alignment by filtration , 2012, Nature Methods.

[21]  Adrian W. Briggs,et al.  A High-Coverage Genome Sequence from an Archaic Denisovan Individual , 2012, Science.

[22]  Martin Kircher,et al.  Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform , 2011, Nucleic acids research.

[23]  Charlotte L. Oskam,et al.  The half-life of DNA in bone: measuring decay kinetics in 158 dated fossils , 2012, Proceedings of the Royal Society B: Biological Sciences.

[24]  Evan E Eichler,et al.  Properties and rates of germline mutations in humans. , 2013, Trends in genetics : TIG.

[25]  Qiaomei Fu,et al.  DNA analysis of an early modern human from Tianyuan Cave, China , 2013, Proceedings of the National Academy of Sciences.

[26]  Jesse Dabney,et al.  Ancient DNA damage. , 2013, Cold Spring Harbor perspectives in biology.

[27]  M. Meyer,et al.  Single-stranded DNA library preparation for the sequencing of ancient or damaged DNA , 2013, Nature Protocols.

[28]  Cristina E. Valdiosera,et al.  Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments , 2013, Proceedings of the National Academy of Sciences.

[29]  Philip L. F. Johnson,et al.  Two ancient human genomes reveal Polynesian ancestry among the indigenous Botocudos of Brazil , 2014, Current Biology.

[30]  E. A. Bennett,et al.  Library construction for ancient genomics: single strand or double strand? , 2014, BioTechniques.

[31]  Mattias Jakobsson,et al.  The genome of a Late Pleistocene human from a Clovis burial site in western Montana , 2014, Nature.

[32]  Heng Li,et al.  Genome sequence of a 45,000-year-old modern human from western Siberia , 2014, Nature.

[33]  Michael C. Westaway,et al.  Genomic structure in Europeans dating back at least 36,200 years , 2014, Science.

[34]  B. Berger,et al.  Ancient human genomes suggest three ancestral populations for present-day Europeans , 2013, Nature.

[35]  R. Mägi,et al.  Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans , 2013, Nature.

[36]  Qiaomei Fu,et al.  A mitochondrial genome sequence of a hominin from Sima de los Huesos , 2013, Nature.

[37]  Arcadi Navarro,et al.  Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European , 2014, Nature.

[38]  Philip L. F. Johnson,et al.  The complete genome sequence of a Neandertal from the Altai Mountains , 2013, Nature.

[39]  Mattias Jakobsson,et al.  Genomic Diversity and Admixture Differs for Stone-Age Scandinavian Foragers and Farmers , 2014, Science.

[40]  János Dani,et al.  Genome flux and stasis in a five millennium transect of European prehistory , 2014, Nature Communications.

[41]  Anders Eriksson,et al.  Upper Palaeolithic genomes reveal deep roots of modern Eurasians , 2015, Nature Communications.

[42]  Eske Willerslev,et al.  Improving access to endogenous DNA in ancient bones and teeth , 2015, Scientific Reports.

[43]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[44]  T. Korneliussen,et al.  Ancient genomics , 2015, Philosophical Transactions of the Royal Society B: Biological Sciences.

[45]  M. Meyer,et al.  Reducing microbial and human contamination in DNA extractions from ancient bones and teeth. , 2015, BioTechniques.

[46]  Søren Brunak,et al.  Population genomics of Bronze Age Eurasia , 2015, Nature.

[47]  Mattias Jakobsson,et al.  Ancient genomes link early farmers from Atapuerca in Spain to modern-day Basques , 2015, Proceedings of the National Academy of Sciences.

[48]  Swapan Mallick,et al.  An early modern human from Romania with a recent Neanderthal ancestor , 2015, Nature.

[49]  M. Thomas P. Gilbert,et al.  A Common Genetic Origin for Early Farmers from Mediterranean Cardial and Central European LBK Cultures , 2015, Molecular biology and evolution.

[50]  James Mallory,et al.  Neolithic and Bronze Age migration to Ireland and establishment of the insular Atlantic genome , 2015, Proceedings of the National Academy of Sciences.

[51]  Mattias Jakobsson,et al.  Genomic evidence for the Pleistocene and recent population history of Native Americans , 2015, Science.

[52]  Cristina E. Valdiosera,et al.  The ancestry and affiliations of Kennewick Man , 2015, Nature.

[53]  A. Eriksson,et al.  Ancient Ethiopian genome reveals extensive Eurasian admixture in Eastern Africa , 2015, Science.

[54]  Yoan Diekmann,et al.  Early farmers from across Europe directly descended from Neolithic Aegeans , 2015, Proceedings of the National Academy of Sciences.

[55]  Mark George Thomas,et al.  Genomic signals of migration and continuity in Britain , 2016 .

[56]  R. Durbin,et al.  Iron Age and Anglo-Saxon genomes from East England reveal British migration history , 2015, Nature Communications.

[57]  Janet Kelso,et al.  Nuclear DNA sequences from the Middle Pleistocene Sima de los Huesos hominins , 2016, Nature.

[58]  Mattias Jakobsson,et al.  Southern African ancient genomes estimate modern human divergence to 350,000 to 260,000 years ago , 2017, Science.

[59]  Jessica C. Thompson,et al.  Reconstructing Prehistoric African Population Structure , 2017, Cell.

[60]  L. Excoffier,et al.  Ancient genomes show social and reproductive behavior of early Upper Paleolithic foragers , 2017, Science.

[61]  Matthias Meyer,et al.  Extending the spectrum of DNA sequences retrieved from ancient bones and teeth. , 2017, Genome research.

[62]  E. Eichler,et al.  A high-coverage Neandertal genome from Vindija Cave in Croatia , 2017, Science.

[63]  Eske Willerslev,et al.  gargammel: a sequence simulator for ancient DNA , 2016, Bioinform..

[64]  Janet Kelso,et al.  Reconstructing the Genetic History of Late Neandertals , 2018, Nature.

[65]  Kay Prüfer,et al.  snpAD: an ancient DNA genotype caller , 2018, bioRxiv.