Imputation of ancient genomes

Due to postmortem DNA degradation, most ancient genomes sequenced to date have low depth of coverage, preventing the true underlying genotypes from being recovered. Genotype imputation has been put forward to improve genotyping accuracy for low-coverage genomes. However, it is unknown to what extent imputation of ancient genomes produces accurate genotypes and whether imputation introduces bias to downstream analyses. To address these questions, we downsampled 43 ancient genomes, 42 of which are high-coverage (above 10x) and three constitute a trio (mother, father and son), from different times and continents to simulate data with coverage in the range of 0.1x-2.0x and imputed these using state-of-the-art methods and reference panels. We assessed imputation accuracy across ancestries and depths of coverage. We found that ancient and modern DNA imputation accuracies were comparable. We imputed most of the 42 high-coverage genomes downsampled to 1x with low error rates (below 5%) and estimated higher error rates for African genomes, which are underrepresented in the reference panel. We used the ancient trio data to validate imputation and phasing results using an orthogonal approach based on Mendel’s rules of inheritance. This resulted in imputation and switch error rates of 1.9% and 2.0%, respectively, for 1x genomes. We further compared the results of downstream analyses between imputed and high-coverage genomes, notably principal component analysis (PCA), genetic clustering, and runs of homozygosity (ROH). For these three approaches, we observed similar results between imputed and high-coverage genomes using depths of coverage of at least 0.5x, except for African genomes, for which the decreased imputation accuracy impacted ROH estimates. Altogether, these results suggest that, for most populations and depths of coverage as low as 0.5x, imputation is a reliable method with potential to expand and improve ancient DNA studies.

[1]  Domingo C. Salazar-García,et al.  Population Genomics of Stone Age Eurasia , 2022, bioRxiv.

[2]  Mattias Jakobsson,et al.  An empirical evaluation of genotype imputation of ancient DNA , 2021, bioRxiv.

[3]  J. Novembre,et al.  Parental relatedness through time revealed by runs of homozygosity in ancient DNA , 2021, Nature Communications.

[4]  B. Browning,et al.  Fast two-stage phasing of large-scale sequence data. , 2021, American journal of human genetics.

[5]  F. Montinaro,et al.  Ancient genomes reveal structural shifts after the arrival of Steppe-related ancestry in the Italian Peninsula , 2021, Current Biology.

[6]  O. Delaneau,et al.  The genomic history of the Aegean palatial civilizations , 2021, Cell.

[7]  S. Myers,et al.  Rapid genotype imputation from sequence with reference panels , 2021, Nature Genetics.

[8]  C. Ruff,et al.  Predicting skeletal stature using ancient DNA , 2021, bioRxiv.

[9]  T. Kivisild,et al.  Evaluating genotype imputation pipeline for ultra-low coverage ancient genomes , 2020, Scientific Reports.

[10]  C. Tyler-Smith,et al.  A Genetic History of the Near East from an aDNA Time Course Sampling Eight Points in the Past 4,000 Years , 2020, American journal of human genetics.

[11]  S. Rubinacci,et al.  Efficient phasing and imputation of low-coverage sequencing data using large reference panels , 2020, bioRxiv.

[12]  Swapan Mallick,et al.  Insights into human genetic variation and population history from 929 diverse genomes , 2019, Science.

[13]  Matthew R. Robinson,et al.  Accurate, scalable and integrative haplotype estimation , 2019, Nature Communications.

[14]  Torsten Günther and Mattias Jakobsson Population Genomic Analyses of DNA from Ancient Remains , 2019, Handbook of Statistical Genomics.

[15]  Joseph K. Pickrell,et al.  Comparing low-pass sequencing and genotyping for trait mapping in pharmacogenetics , 2019, bioRxiv.

[16]  S. Rasmussen,et al.  Unraveling ancestry, kinship, and violence in a Late Neolithic mass grave , 2019, Proceedings of the National Academy of Sciences.

[17]  Brian E. Cade,et al.  Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program , 2019, Nature.

[18]  Yun S. Song,et al.  Early human dispersals within the Americas , 2018, Science.

[19]  T. Günther,et al.  The presence and impact of reference bias on population genomic studies of prehistoric human populations , 2018, bioRxiv.

[20]  P. Donnelly,et al.  The UK Biobank resource with deep phenotyping and genomic data , 2018, Nature.

[21]  Brian L Browning,et al.  Genotype Imputation from Large Reference Panels. , 2018, Annual review of genomics and human genetics.

[22]  Peter K. Joshi,et al.  Runs of homozygosity: windows into population history and trait architecture , 2018, Nature Reviews Genetics.

[23]  Mattias Jakobsson,et al.  Southern African ancient genomes estimate modern human divergence to 350,000 to 260,000 years ago , 2017, Science.

[24]  L. Excoffier,et al.  Ancient genomes show social and reproductive behavior of early Upper Paleolithic foragers , 2017, Science.

[25]  R. McLaughlin,et al.  The population genomics of archaeological transition in west Iberia: Investigation of ancient substructure using imputation and haplotype-based methods , 2017, bioRxiv.

[26]  Daniel Wegmann,et al.  ATLAS: Analysis Tools for Low-depth and Ancient Samples , 2017, bioRxiv.

[27]  Brian L Browning,et al.  Genotype Imputation with Millions of Reference Samples. , 2016, American journal of human genetics.

[28]  James Mallory,et al.  Neolithic and Bronze Age migration to Ireland and establishment of the insular Atlantic genome , 2015, Proceedings of the National Academy of Sciences.

[29]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[30]  Swapan Mallick,et al.  An early modern human from Romania with a recent Neanderthal ancestor , 2015, Nature.

[31]  Heng Li,et al.  Genome sequence of a 45,000-year-old modern human from western Siberia , 2014, Nature.

[32]  Bonnie Berger,et al.  Ancient human genomes suggest three ancestral populations for present-day Europeans , 2013, Nature.

[33]  O. Delaneau,et al.  Supplementary Information for ‘ Improved whole chromosome phasing for disease and population genetic studies ’ , 2012 .

[34]  Swapan Mallick,et al.  Ancient Admixture in Human History , 2012, Genetics.

[35]  B. Shapiro,et al.  Ancient DNA , 2020, Definitions.

[36]  B. Browning,et al.  Haplotype phasing: existing methods and new developments , 2011, Nature Reviews Genetics.

[37]  A. Siepel,et al.  Bayesian inference of ancient human demography from individual genome sequences , 2011, Nature Genetics.

[38]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[39]  J. Marchini,et al.  Genotype imputation for genome-wide association studies , 2010, Nature Reviews Genetics.

[40]  David H. Alexander,et al.  Fast model-based estimation of ancestry in unrelated individuals. , 2009, Genome research.

[41]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[42]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[43]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[44]  Philip L. F. Johnson,et al.  Patterns of damage in genomic DNA sequences from a Neandertal , 2007, Proceedings of the National Academy of Sciences.

[45]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[46]  Zhaohui S. Qin,et al.  A comparison of phasing algorithms for trios and unrelated individuals. , 2006, American journal of human genetics.

[47]  Terrence S. Furey,et al.  The UCSC Table Browser data retrieval tool , 2004, Nucleic Acids Res..

[48]  M. Przeworski Faculty Opinions recommendation of Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. , 2003 .

[49]  M. Stephens,et al.  Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. , 2003, Genetics.