Tag jumps illuminated – reducing sequence‐to‐sample misidentifications in metabarcoding studies

Metabarcoding of environmental samples on second‐generation sequencing platforms has rapidly become a valuable tool for ecological studies. A fundamental assumption of this approach is the reliance on being able to track tagged amplicons back to the samples from which they originated. In this study, we address the problem of sequences in metabarcoding sequencing outputs with false combinations of used tags (tag jumps). Unless these sequences can be identified and excluded from downstream analyses, tag jumps creating sequences with false, but already used tag combinations, can cause incorrect assignment of sequences to samples and artificially inflate diversity. In this study, we document and investigate tag jumping in metabarcoding studies on Illumina sequencing platforms by amplifying mixed‐template extracts obtained from bat droppings and leech gut contents with tagged generic arthropod and mammal primers, respectively. We found that an average of 2.6% and 2.1% of sequences had tag combinations, which could be explained by tag jumping in the leech and bat diet study, respectively. We suggest that tag jumping can happen during blunt‐ending of pools of tagged amplicons during library build and as a consequence of chimera formation during bulk amplification of tagged amplicons during library index PCR. We argue that tag jumping and contamination between libraries represents a considerable challenge for Illumina‐based metabarcoding studies, and suggest measures to avoid false assignment of tag jumping‐derived sequences to samples.

[1]  Vanja Klepac-Ceraj,et al.  PCR-Induced Sequence Artifacts and Bias: Insights from Comparison of Two 16S rRNA Clone Libraries Constructed from the Same Sample , 2005, Applied and Environmental Microbiology.

[2]  Gareth Jones,et al.  Taxon‐specific PCR for DNA barcoding arthropod prey in bat faeces , 2011, Molecular ecology resources.

[3]  Shinji Katsura,et al.  Single-molecule PCR using water-in-oil emulsion. , 2003, Journal of biotechnology.

[4]  Eric Coissac,et al.  Bioinformatic challenges for DNA metabarcoding of plants and animals , 2012, Molecular ecology.

[5]  Lounès Chikhi,et al.  A DNA Metabarcoding Study of a Primate Dietary Diversity and Plasticity across Its Entire Fragmented Range , 2013, PloS one.

[6]  Bernard Perbal,et al.  Enzymes used in molecular biology: a useful guide , 2008, Journal of Cell Communication and Signaling.

[7]  P. Taberlet,et al.  Carnivore diet analysis based on next‐generation sequencing: application to the leopard cat (Prionailurus bengalensis) in Pakistan , 2012, Molecular ecology.

[8]  B. Deagle,et al.  Analysis of Australian fur seal diet by pyrosequencing prey DNA in faeces , 2009, Molecular ecology.

[9]  Jan van Oeveren,et al.  Complexity Reduction of Polymorphic Sequences (CRoPS™): A Novel Approach for Large-Scale Polymorphism Discovery in Complex Genomes , 2007, PloS one.

[10]  G. Wang,et al.  The frequency of chimeric molecules as a consequence of PCR co-amplification of 16S rRNA genes from different bacterial species. , 1996, Microbiology.

[11]  P. Taberlet,et al.  Using next‐generation sequencing for molecular reconstruction of past Arctic vegetation and climate , 2010, Molecular ecology resources.

[12]  Cameron S. Osborne,et al.  Large Scale Loss of Data in Low-Diversity Illumina Sequencing Libraries Can Be Recovered by Deferred Cluster Calling , 2011, PloS one.

[13]  Martin Kircher,et al.  Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform , 2011, Nucleic acids research.

[14]  Daniel J. G. Lahr,et al.  Reducing the impact of PCR-mediated recombination in molecular evolution and environmental studies using a new-generation high-fidelity DNA polymerase. , 2009, BioTechniques.

[15]  P. Taberlet,et al.  DNA metabarcoding multiplexing and validation of data accuracy for diet assessment: application to omnivorous diet , 2014, Molecular ecology resources.

[16]  E. Willerslev,et al.  Response to Comment by Goldberg et al. on “DNA from Pre-Clovis Human Coprolites in Oregon, North America” , 2009, Science.

[17]  Rob Knight,et al.  UCHIME improves sensitivity and speed of chimera detection , 2011, Bioinform..

[18]  Jonathan P. Bollback,et al.  The Use of Coded PCR Primers Enables High-Throughput Sequencing of Multiple Homolog Amplification Products by 454 Parallel Sequencing , 2007, PloS one.

[19]  P. Taberlet,et al.  Species detection using environmental DNA from water samples , 2008, Biology Letters.

[20]  C. Jerde,et al.  The use of environmental DNA in invasive species surveillance of the Great Lakes commercial bait trade , 2015, Conservation biology : the journal of the Society for Conservation Biology.

[21]  Daniel L. Lindner,et al.  Don't make a mista(g)ke: is tag switching an overlooked source of error in amplicon pyrosequencing studies? , 2012 .

[22]  W. L. Chadderton,et al.  “Sight‐unseen” detection of rare aquatic species using environmental DNA , 2011 .

[23]  Eric Coissac,et al.  OligoTag: a program for designing sets of tags for next-generation sequencing of multiplexed samples. , 2012, Methods in molecular biology.

[24]  P. Taberlet,et al.  Environmental DNA , 2012, Molecular ecology.

[25]  J. Piñol,et al.  A pragmatic approach to the analysis of diets of generalist predators: the use of next‐generation sequencing with no blocking probes , 2014, Molecular ecology resources.

[26]  Leon Metzeling,et al.  Environmental monitoring using next generation sequencing: rapid identification of macroinvertebrate bioindicator species , 2013, Frontiers in Zoology.

[27]  Julie A. Jedlicka,et al.  Protocols for metagenomic DNA extraction and Illumina amplicon library preparation for faecal and swab samples , 2014, Molecular ecology resources.

[28]  V. Beneš,et al.  The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. , 2009, Clinical chemistry.

[29]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[30]  R. Halvorsen,et al.  Arctic root‐associated fungal community composition reflects environmental filtering , 2014, Molecular ecology.

[31]  T. Lee,et al.  Regional effects on chimera formation in 454 pyrosequenced amplicons from a mock community , 2014, Journal of Microbiology.

[32]  P. G. Taylor Reproducibility of ancient DNA sequences from extinct Pleistocene fauna. , 1996, Molecular biology and evolution.

[33]  P. Taberlet,et al.  Towards next‐generation biodiversity assessment using DNA metabarcoding , 2012, Molecular ecology.

[34]  Philippe Esling,et al.  Accurate multiplexing and filtering for high-throughput amplicon-sequencing , 2015, Nucleic acids research.

[35]  Tor Carlsen,et al.  Employing 454 amplicon pyrosequencing to reveal intragenomic divergence in the internal transcribed spacer rDNA region in fungi , 2013, Ecology and evolution.

[36]  P. Taberlet,et al.  Who is eating what: diet assessment using next generation sequencing , 2012, Molecular ecology.

[37]  C. Wilson,et al.  Stimulation and suppression of PCR-mediated recombination. , 1998, Nucleic acids research.

[38]  P. Taberlet,et al.  Universal DNA-based methods for assessing the diet of grazing livestock and wildlife from feces. , 2009, Journal of agricultural and food chemistry.

[39]  M. Ohlson,et al.  Host- and tissue-specificity of moss-associated Galerina and Mycena determined from amplicon pyrosequencing data , 2013 .

[40]  Jizhong Zhou,et al.  Evaluation of PCR-Generated Chimeras, Mutations, and Heteroduplexes with 16S rRNA Gene-Based Cloning , 2001, Applied and Environmental Microbiology.

[41]  L. Zinger,et al.  Two decades of describing the unseen majority of aquatic microbial diversity , 2012, Molecular ecology.

[42]  G. Wang,et al.  Frequency of formation of chimeric molecules as a consequence of PCR coamplification of 16S rRNA genes from mixed bacterial genomes , 1997, Applied and environmental microbiology.

[43]  Andrew J. Grimm,et al.  Reducing chimera formation during PCR amplification to ensure accurate genotyping. , 2010, Gene.

[44]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[45]  A. Meyerhans,et al.  DNA recombination during PCR. , 1990, Nucleic acids research.

[46]  Pierre Taberlet,et al.  Analysing diet of small herbivores: the efficiency of DNA barcoding coupled with high-throughput pyrosequencing for deciphering the composition of complex plant mixtures , 2009, Frontiers in Zoology.

[47]  Z. Ning,et al.  Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of GC-biased genomes , 2009, Nature Methods.

[48]  Matthew J. Colloff,et al.  Ecological assessment of estuarine sediments by pyrosequencing eukaryotic ribosomal DNA , 2010 .

[49]  Russell J. Davenport,et al.  Removing Noise From Pyrosequenced Amplicons , 2011, BMC Bioinformatics.

[50]  Kristine Bohmann,et al.  Second generation sequencing and morphological faecal analysis reveal unexpected foraging behaviour by Myotis nattereri (Chiroptera, Vespertilionidae) in winter , 2014, Frontiers in Zoology.

[51]  M. Ohlson,et al.  Forestry impacts on the hidden fungal biodiversity associated with bryophytes. , 2014, FEMS microbiology ecology.

[52]  Kristine Bohmann,et al.  Molecular Diet Analysis of Two African Free-Tailed Bats (Molossidae) Using High Throughput Sequencing , 2011, PloS one.

[53]  Stinus Lindgreen,et al.  AdapterRemoval: easy cleaning of next-generation sequencing reads , 2012, BMC Research Notes.

[54]  M. Hindell,et al.  Pseudogenes and DNA-based diet analyses: a cautionary tale from a relatively well sampled predator-prey system , 2008, Bulletin of Entomological Research.

[55]  Matthias Meyer,et al.  Illumina sequencing library preparation for highly multiplexed target capture and sequencing. , 2010, Cold Spring Harbor protocols.

[56]  Dáithí C. Murray,et al.  Scrapheap Challenge: A novel bulk-bone metabarcoding method to investigate ancient DNA in faunal assemblages , 2013, Scientific Reports.

[57]  P. Taberlet,et al.  New perspectives in diet analysis based on DNA barcoding and parallel pyrosequencing: the trnL approach , 2009, Molecular ecology resources.

[58]  M. Hofreiter,et al.  Ancient DNA , 2019, Methods in Molecular Biology.

[59]  P. Kirk,et al.  ITS1 versus ITS2 as DNA metabarcodes for fungi , 2013, Molecular ecology resources.

[60]  James Haile,et al.  Ancient DNA reveals late survival of mammoth and horse in interior Alaska , 2009, Proceedings of the National Academy of Sciences.

[61]  P. B. Eidesen,et al.  Low host specificity of root‐associated fungi at an Arctic site , 2014, Molecular ecology.