Index hopping on the Illumina HiseqX platform and its consequences for ancient DNA studies

The high‐throughput capacities of the Illumina sequencing platforms and the possibility to label samples individually have encouraged wide use of sample multiplexing. However, this practice results in read misassignment (usually <1%) across samples sequenced on the same lane. Alarmingly high rates of read misassignment of up to 10% were reported for lllumina sequencing machines with exclusion amplification chemistry. This may make use of these platforms prohibitive, particularly in studies that rely on low‐quantity and low‐quality samples, such as historical and archaeological specimens. Here, we use barcodes, short sequences that are ligated to both ends of the DNA insert, to directly quantify the rate of index hopping in 100‐year old museum‐preserved gorilla (Gorilla beringei) samples. Correcting for multiple sources of noise, we identify on average 0.470% of reads containing a hopped index. We show that sample‐specific quantity of misassigned reads depends on the number of reads that any given sample contributes to the total sequencing pool, so that samples with few sequenced reads receive the greatest proportion of misassigned reads. This particularly affects ancient DNA samples, as these frequently differ in their DNA quantity and endogenous content. Through simulations we show that even low rates of index hopping, as reported here, can lead to biases in ancient DNA studies when multiplexing samples with vastly different quantities of endogenous material.

[1]  B. Berger,et al.  Ancient human genomes suggest three ancestral populations for present-day Europeans , 2013, Nature.

[2]  Sharon L. Grim,et al.  Analysis, Optimization and Verification of Illumina-Generated 16S rRNA Gene Amplicon Surveys , 2014, PloS one.

[3]  Mattias Jakobsson,et al.  The genome of a Late Pleistocene human from a Clovis burial site in western Montana , 2014, Nature.

[4]  L. Rieseberg,et al.  A novel post hoc method for detecting index switching finds no evidence for increased switching on the Illumina HiSeq X. , 2018, Molecular ecology resources.

[5]  L. Weyrich,et al.  From the field to the laboratory: Controlling DNA contamination in human ancient DNA research in the high-throughput sequencing era , 2017 .

[6]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[7]  L. Rieseberg,et al.  A novel post hoc method for detecting index switching finds no evidence for increased switching on the Illumina HiSeq X , 2017, bioRxiv.

[8]  Stinus Lindgreen,et al.  AdapterRemoval v2: rapid adapter trimming, identification, and read merging , 2016, BMC Research Notes.

[9]  T. Candresse,et al.  Phytovirome Analysis of Wild Plant Populations: Comparison of Double-Stranded RNA and Virion-Associated Nucleic Acid Metagenomic Approaches , 2019, Journal of Virology.

[10]  Janet Kelso,et al.  deML: robust demultiplexing of Illumina sequences using a likelihood-based approach , 2014, Bioinform..

[11]  Philip L. F. Johnson,et al.  The complete genome sequence of a Neandertal from the Altai Mountains , 2013, Nature.

[12]  S. Odelberg,et al.  Template-switching during DNA synthesis by Thermus aquaticus DNA polymerase I. , 1995, Nucleic acids research.

[13]  Significant loss of mitochondrial diversity within the last century due to extinction of peripheral populations in eastern gorillas , 2018, Scientific Reports.

[14]  Colleen R. Stephens,et al.  Recent divergences and size decreases of eastern gorilla populations , 2014, Biology Letters.

[15]  M. Meyer,et al.  Single-stranded DNA library preparation for the sequencing of ancient or damaged DNA , 2013, Nature Protocols.

[16]  Katerina Guschanski,et al.  Whole mitochondrial genome capture from faecal samples and museum‐preserved specimens , 2017, Molecular ecology resources.

[17]  Adrian W. Briggs,et al.  The Neandertal genome and ancient DNA authenticity , 2009, The EMBO journal.

[18]  Matthew J. Huentelman,et al.  IDENTIFICATION OF GENETIC VARIANTS USING BARCODED MULTIPLEXED SEQUENCING , 2008, Nature Methods.

[19]  Philippe Esling,et al.  Accurate multiplexing and filtering for high-throughput amplicon-sequencing , 2015, Nucleic acids research.

[20]  Heng Li,et al.  Genome sequence of a 45,000-year-old modern human from western Siberia , 2014, Nature.

[21]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[22]  M. Rowicka,et al.  Strategies for Achieving High Sequencing Accuracy for Low Diversity Samples and Avoiding Sample Bleeding Using Illumina Platform , 2015, PloS one.

[23]  Swapan Mallick,et al.  Partial uracil–DNA–glycosylase treatment for screening of ancient DNA , 2015, Philosophical Transactions of the Royal Society B: Biological Sciences.

[24]  Daniel L. Lindner,et al.  Don't make a mista(g)ke: is tag switching an overlooked source of error in amplicon pyrosequencing studies? , 2012 .

[25]  Robert P. St.Onge,et al.  Highly-multiplexed barcode sequencing: an efficient method for parallel analysis of pooled samples , 2010, Nucleic acids research.

[26]  I. Weissman,et al.  Index switching causes “spreading-of-signal” among multiplexed samples in Illumina HiSeq 4000 DNA sequencing , 2017, bioRxiv.

[27]  John C Marioni,et al.  Detection and removal of barcode swapping in single-cell RNA-seq data , 2017, Nature Communications.

[28]  Anders Albrechtsen,et al.  ANGSD: Analysis of Next Generation Sequencing Data , 2014, BMC Bioinformatics.

[29]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[30]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[31]  Arcadi Navarro,et al.  Great ape genetic diversity and population history , 2013, Nature.

[32]  Albert J. Vilella,et al.  Insights into hominid evolution from the gorilla genome sequence , 2012, Nature.

[33]  Rickard Sandberg,et al.  Computational correction of index switching in multiplexed sequencing libraries , 2018, Nature Methods.

[34]  E. Hovig,et al.  Sample-Index Misassignment Impacts Tumour Exome Sequencing , 2018, Scientific Reports.

[35]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[36]  A. Albrechtsen,et al.  Inferring Population Structure and Admixture Proportions in Low-Depth NGS Data , 2018, Genetics.

[37]  A. Meyerhans,et al.  DNA recombination during PCR. , 1990, Nucleic acids research.

[38]  Yossi Farjoun,et al.  Characterization and remediation of sample index swaps by non-redundant dual indexing on massively parallel sequencing platforms , 2017, BMC Genomics.

[39]  T. Günther,et al.  The presence and impact of reference bias on population genomic studies of prehistoric human populations , 2018 .

[40]  M. Hofreiter,et al.  A Paleogenomic Perspective on Evolution and Gene Function: New Insights from Ancient DNA , 2014, Science.

[41]  Martin Kircher,et al.  Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform , 2011, Nucleic acids research.

[42]  E. Wright,et al.  Quality filtering of Illumina index reads mitigates sample cross-talk , 2016, BMC Genomics.

[43]  Robert A. Edwards,et al.  Quality control and preprocessing of metagenomic datasets , 2011, Bioinform..

[44]  Eske Willerslev,et al.  gargammel: a sequence simulator for ancient DNA , 2016, Bioinform..

[45]  G. K. Sandve,et al.  Exploiting antigen receptor information to quantify index switching in single-cell transcriptome sequencing experiments , 2018, PloS one.

[46]  M. Thomas P. Gilbert,et al.  Single‐tube library preparation for degraded DNA , 2017 .

[47]  John G Kenny,et al.  A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA community profiling , 2016, BMC Genomics.

[48]  Matthias Meyer,et al.  Illumina sequencing library preparation for highly multiplexed target capture and sequencing. , 2010, Cold Spring Harbor protocols.

[49]  Peter H. Sudmant,et al.  Mountain gorilla genomes reveal the impact of long-term population decline and inbreeding , 2015, Science.

[50]  T. Marquès-Bonet,et al.  Historical Genomes Reveal the Genomic Consequences of Recent Population Decline in Eastern Gorillas , 2019, Current Biology.

[51]  J. Wall,et al.  Inconsistencies in Neanderthal Genomic DNA Sequences , 2007, PLoS genetics.

[52]  D. Reich,et al.  Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture , 2012, Genome research.

[53]  M. Slatkin,et al.  Distinguishing Recent Admixture from Ancestral Population Structure , 2017, Genome biology and evolution.