A likelihood method for estimating present-day human contamination in ancient DNA samples using low-depth haploid chromosome data

Motivation The presence of present-day human contaminating DNA fragments is one of the challenges defining ancient DNA (aDNA) research. This is especially relevant to the ancient human DNA field where it is difficult to distinguish endogenous molecules from human contaminants due to their genetic similarity. Recently, with the advent of high-throughput sequencing and new aDNA protocols, hundreds of ancient human genomes have become available. Contamination in those genomes has been measured with computational methods often developed specifically for these empirical studies. Consequently, some of these methods have not been implemented and tested while few are aimed at low-depth data, a common feature in aDNA datasets. Results We develop a new X-chromosome-based maximum likelihood method for estimating present-day human contamination in low-depth sequencing data. We implement our method for general use, assess its performance under conditions typical of ancient human DNA research, and compare it to previous nuclear data-based methods through extensive simulations. For low-depth data, we show that existing methods can produce unusable estimates or substantially underestimate contamination. In contrast, our method provides accurate estimates for a depth of coverage as low as 0.5× on the X-chromosome when contamination is below 25%. Moreover, our method still yields meaningful estimates in very challenging situations, i.e., when the contaminant and the target come from closely related populations or with increased error rates. With a running time below five minutes, our method is applicable to large scale aDNA genomic studies. Availability and implementation The method is implemented in C++ and R and is freely available in https://github.com/sapfo/contaminationX. Contact morenomayar@gmail.com, annasapfo.malaspinas@unil.ch.

[1]  Yong Wang,et al.  An Aboriginal Australian Genome Reveals Separate Human Dispersals into Asia , 2011, Science.

[2]  L. Weyrich,et al.  From the field to the laboratory: Controlling DNA contamination in human ancient DNA research in the high-throughput sequencing era , 2017 .

[3]  Philip L. F. Johnson,et al.  A Revised Timescale for Human Evolution Based on Ancient Mitochondrial Genomes , 2013, Current Biology.

[4]  Anders Albrechtsen,et al.  ANGSD: Analysis of Next Generation Sequencing Data , 2014, BMC Bioinformatics.

[5]  M. Hofreiter,et al.  Assessing ancient DNA studies. , 2005, Trends in ecology & evolution.

[6]  E. Willerslev,et al.  Review Paper. Ancient DNA , 2005, Proceedings of the Royal Society B: Biological Sciences.

[7]  Søren Brunak,et al.  Population genomics of Bronze Age Eurasia , 2015, Nature.

[8]  Philip L. F. Johnson,et al.  Patterns of damage in genomic DNA sequences from a Neandertal , 2007, Proceedings of the National Academy of Sciences.

[9]  Philip L. F. Johnson,et al.  A Complete Neandertal Mitochondrial Genome Sequence Determined by High-Throughput Sequencing , 2008, Cell.

[10]  Thierry Grange,et al.  An Efficient Multistrategy DNA Decontamination Procedure of PCR Reagents for Hypersensitive PCR Applications , 2010, PloS one.

[11]  S. Pääbo,et al.  Genetic analyses from ancient DNA. , 2004, Annual review of genetics.

[12]  Melissa A. Wilson Sayres,et al.  137 ancient human genomes from across the Eurasian steppes , 2018, Nature.

[13]  J. Krause,et al.  Ratio of mitochondrial to nuclear DNA affects contamination estimates in ancient DNA analysis , 2018, Scientific Reports.

[14]  Philip L. F. Johnson,et al.  Two ancient human genomes reveal Polynesian ancestry among the indigenous Botocudos of Brazil , 2014, Current Biology.

[15]  T. Korneliussen,et al.  Ancient genomics , 2015, Philosophical Transactions of the Royal Society B: Biological Sciences.

[16]  Eske Willerslev,et al.  gargammel: a sequence simulator for ancient DNA , 2016, Bioinform..

[17]  Marie Besse,et al.  The Beaker Phenomenon and the Genomic Transformation of Northwest Europe , 2018, Nature.

[18]  M. Slatkin,et al.  Joint Estimation of Contamination, Error and Demography for Nuclear DNA from Ancient Humans , 2015, bioRxiv.

[19]  C. Lalueza-Fox,et al.  Tracking down human contamination in ancient human teeth. , 2006, Molecular biology and evolution.

[20]  L. Orlando,et al.  Reconstructing ancient genomes and epigenomes , 2015, Nature Reviews Genetics.

[21]  Janet Kelso,et al.  Schmutzi: estimation of contamination and endogenous mitochondrial consensus calling for ancient DNA , 2015, Genome Biology.

[22]  Ludovic Antoine Alexandre,et al.  Improving ancient DNA read mapping against modern reference genomes , 2015 .

[23]  M. Stoneking,et al.  Neandertal DNA Sequences and the Origin of Modern Humans , 1997, Cell.

[24]  A. Wilson,et al.  DNA sequences from the quagga, an extinct member of the horse family , 1984, Nature.

[25]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[26]  Adrian W. Briggs,et al.  A High-Coverage Genome Sequence from an Archaic Denisovan Individual , 2012, Science.

[27]  H. Zischler,et al.  Detecting dinosaur DNA. , 1995, Science.

[28]  Mattias Jakobsson,et al.  The genome of a Late Pleistocene human from a Clovis burial site in western Montana , 2014, Nature.

[29]  J. Wall,et al.  Inconsistencies in Neanderthal Genomic DNA Sequences , 2007, PLoS genetics.

[30]  Cristina E. Valdiosera,et al.  The ancestry and affiliations of Kennewick Man , 2015, Nature.

[31]  Anders Krogh,et al.  Improving ancient DNA read mapping against modern reference genomes , 2012, BMC Genomics.