Random sampling causes the low reproducibility of rare eukaryotic OTUs in Illumina COI metabarcoding

DNA metabarcoding, the PCR-based profiling of natural communities, is becoming the method of choice for biodiversity monitoring because it circumvents some of the limitations inherent to traditional ecological surveys. However, potential sources of bias that can affect the reproducibility of this method remain to be quantified. The interpretation of differences in patterns of sequence abundance and the ecological relevance of rare sequences remain particularly uncertain. Here we used one artificial mock community to explore the significance of abundance patterns and disentangle the effects of two potential biases on data reproducibility: indexed PCR primers and random sampling during Illumina MiSeq sequencing. We amplified a short fragment of the mitochondrial Cytochrome c Oxidase Subunit I (COI) for a single mock sample containing equimolar amounts of total genomic DNA from 34 marine invertebrates belonging to six phyla. We used seven indexed broad-range primers and sequenced the resulting library on two consecutive Illumina MiSeq runs. The total number of Operational Taxonomic Units (OTUs) was ∼4 times higher than expected based on the composition of the mock sample. Moreover, the total number of reads for the 34 components of the mock sample differed by up to three orders of magnitude. However, 79 out of 86 of the unexpected OTUs were represented by <10 sequences that did not appear consistently across replicates. Our data suggest that random sampling of rare OTUs (e.g., small associated fauna such as parasites) accounted for most of variation in OTU presence–absence, whereas biases associated with indexed PCRs accounted for a larger amount of variation in relative abundance patterns. These results suggest that random sampling during sequencing leads to the low reproducibility of rare OTUs. We suggest that the strategy for handling rare OTUs should depend on the objectives of the study. Systematic removal of rare OTUs may avoid inflating diversity based on common β descriptors but will exclude positive records of taxa that are functionally important. Our results further reinforce the need for technical replicates (parallel PCR and sequencing from the same sample) in metabarcoding experimental designs. Data reproducibility should be determined empirically as it will depend upon the sequencing depth, the type of sample, the sequence analysis pipeline, and the number of replicates. Moreover, estimating relative biomasses or abundances based on read counts remains elusive at the OTU level.

[1]  M. Johnson,et al.  Circulating microRNAs in Sera Correlate with Soluble Biomarkers of Immune Activation but Do Not Predict Mortality in ART Treated Individuals with HIV-1 Infection: A Case Control Study , 2015, PloS one.

[2]  N. Knowlton,et al.  Mitochondrial pseudogenes are pervasive and often insidious in the snapping shrimp genus Alpheus. , 2001, Molecular biology and evolution.

[3]  Brian J. Smith,et al.  Environmental DNA (eDNA) Sampling Improves Occurrence and Detection Estimates of Invasive Burmese Pythons , 2015, PloS one.

[4]  Frédéric Delsuc,et al.  MACSE: Multiple Alignment of Coding SEquences Accounting for Frameshifts and Stop Codons , 2011, PloS one.

[5]  P. Hebert,et al.  bold: The Barcode of Life Data System (http://www.barcodinglife.org) , 2007, Molecular ecology notes.

[6]  P. Bork,et al.  Eukaryotic plankton diversity in the sunlit ocean , 2015, Science.

[7]  Vasco Elbrecht,et al.  Can DNA-Based Ecosystem Assessments Quantify Species Abundance? Testing Primer Bias and Biomass—Sequence Relationships with an Innovative Metabarcoding Protocol , 2015, PloS one.

[8]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[9]  B. Deagle,et al.  Quantifying sequence proportions in a DNA‐based diet study using Ion Torrent amplicon sequencing: which counts count? , 2013, Molecular ecology resources.

[10]  K. Crandall,et al.  Many species in one: DNA barcoding overestimates the number of species when nuclear mitochondrial pseudogenes are coamplified , 2008, Proceedings of the National Academy of Sciences.

[11]  James D Nichols,et al.  Modeling false positive detections in species occurrence data under different study designs. , 2015, Ecology.

[12]  Robert C. Edgar,et al.  Error filtering, pair assembly and error correction for next-generation sequencing reads , 2015, Bioinform..

[13]  Theodore R. Simons,et al.  Performance of species occurrence estimators when basic assumptions are not met: a test using field data where true occupancy status is known , 2015 .

[14]  Yoji Nakamura,et al.  Effects of plankton net characteristics on metagenetic community analysis of metazoan zooplankton in a coastal marine ecosystem , 2015 .

[15]  Kristine Bohmann,et al.  Tag jumps illuminated – reducing sequence‐to‐sample misidentifications in metabarcoding studies , 2015, Molecular ecology resources.

[16]  Kristy Deiner,et al.  Special Issue Article: Environmental DNA Choice of capture and extraction methods affect detection of freshwater biodiversity from environmental DNA , 2015 .

[17]  Philippe Esling,et al.  High-throughput sequencing and morphology perform equally well for benthic monitoring of marine ecosystems , 2015, Scientific Reports.

[18]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[19]  Frédéric J. J. Chain,et al.  Reproducibility of pyrosequencing data for biodiversity assessment in complex communities , 2014 .

[20]  Stéphane Audic,et al.  The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote Small Sub-Unit rRNA sequences with curated taxonomy , 2012, Nucleic Acids Res..

[21]  Frédéric J. J. Chain,et al.  Divergence thresholds and divergent biodiversity estimates: can metabarcoding reliably describe zooplankton communities? , 2015, Ecology and evolution.

[22]  V. Kunin,et al.  Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. , 2009, Environmental microbiology.

[23]  Wolfgang Schwanghart,et al.  Undersampling and the measurement of beta diversity , 2013 .

[24]  Anastasija Zaiko,et al.  Metabarcoding approach for the ballast water surveillance--an advantageous solution or an awkward challenge? , 2015, Marine pollution bulletin.

[25]  Sarah J. Bourlat,et al.  Preparation of Amplicon Libraries for Metabarcoding of Marine Eukaryotes Using Illumina MiSeq: The Adapter Ligation Method. , 2016, Methods in molecular biology.

[26]  Guanliang Meng,et al.  High‐throughput monitoring of wild bee diversity and abundance via mitogenomics , 2015, Methods in ecology and evolution.

[27]  Douglas W. Yu,et al.  Reliable, verifiable and efficient monitoring of biodiversity via metabarcoding. , 2013, Ecology letters.

[28]  P. Hebert,et al.  bold: The Barcode of Life Data System (http://www.barcodinglife.org) , 2007, Molecular ecology notes.

[29]  L. Raskin,et al.  PCR Biases Distort Bacterial and Archaeal Community Structure in Pyrosequencing Datasets , 2012, PloS one.

[30]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[31]  M. Wagner,et al.  Barcoded Primers Used in Multiplex Amplicon Pyrosequencing Bias Amplification , 2011, Applied and Environmental Microbiology.

[32]  Heng Li,et al.  BFC: correcting Illumina sequencing errors , 2015, Bioinform..

[33]  Jullien M. Flynn,et al.  Toward accurate molecular identification of species in complex environmental samples: testing the performance of sequence filtering and clustering methods , 2015, Ecology and evolution.

[34]  N. Knowlton,et al.  Censusing marine eukaryotic diversity in the twenty-first century , 2016, Philosophical Transactions of the Royal Society B: Biological Sciences.

[35]  Jesse A. Port,et al.  Using Environmental DNA to Census Marine Fishes in a Large Mesocosm , 2014, PloS one.

[36]  J. Enríquez,et al.  Tissue-specific differences in mitochondrial activity and biogenesis. , 2011, Mitochondrion.

[37]  H. MacIsaac,et al.  Rare biosphere exploration using high-throughput sequencing: research progress and perspectives , 2015, Conservation Genetics.

[38]  José J Lahoz-Monfort,et al.  Statistical approaches to account for false‐positive errors in environmental DNA samples , 2016, Molecular ecology resources.

[39]  Martin Kircher,et al.  Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform , 2011, Nucleic acids research.

[40]  Patrick D. Schloss,et al.  Reducing the Effects of PCR Amplification and Sequencing Artifacts on 16S rRNA-Based Studies , 2011, PloS one.

[41]  Holly M. Bik,et al.  Sequencing our way towards understanding global eukaryotic biodiversity. , 2012, Trends in ecology & evolution.

[42]  Christopher P. Meyer,et al.  Metabarcoding dietary analysis of coral dwelling predatory fish demonstrates the minor contribution of coral mutualists to their highly partitioned, generalist diet , 2015, PeerJ.

[43]  V. Ranwez,et al.  A new versatile primer set targeting a short fragment of the mitochondrial COI region for metabarcoding metazoan diversity: application for characterizing coral reef fish gut contents , 2013, Frontiers in Zoology.

[44]  Jesse A. Port,et al.  Indexed PCR Primers Induce Template-Specific Bias in Large-Scale DNA Sequencing Studies , 2016, PloS one.

[45]  Rob Knight,et al.  UCHIME improves sensitivity and speed of chimera detection , 2011, Bioinform..

[46]  P. Legendre,et al.  vegan : Community Ecology Package. R package version 1.8-5 , 2007 .

[47]  Daniel K. Manter,et al.  Estimating beta diversity for under-sampled communities using the variably weighted Odum dissimilarity index and OTUshuff , 2015, Bioinform..

[48]  M. Bonkowski,et al.  Not all are free‐living: high‐throughput DNA metabarcoding reveals a diverse community of protists parasitizing soil metazoa , 2015, Molecular ecology.

[49]  K. Halanych,et al.  Meiofaunal community analysis by high-throughput sequencing: comparison of extraction, quality filtering, and clustering methods. , 2015, Marine genomics.

[50]  Marti J. Anderson,et al.  A new method for non-parametric multivariate analysis of variance in ecology , 2001 .

[51]  Jizhong Zhou,et al.  Reproducibility and quantitation of amplicon sequencing-based detection , 2011, The ISME Journal.

[52]  Kenneth K. Lopiano,et al.  RNA-seq: technical variability and sampling , 2011, BMC Genomics.

[53]  M. Nishida,et al.  Zooplankton diversity analysis through single-gene sequencing of a community sample , 2009, BMC Genomics.

[54]  Ting Chen,et al.  Clustering 16S rRNA for OTU prediction: a method of unsupervised Bayesian clustering , 2011, Bioinform..

[55]  Michael P. Cummings,et al.  A Gateway for Phylogenetic Analysis Powered by Grid Computing Featuring GARLI 2.0 , 2014, Systematic biology.

[56]  Gregory D. Williams,et al.  A framework for inferring biological communities from environmental DNA. , 2016, Ecological applications : a publication of the Ecological Society of America.

[57]  P. Taberlet,et al.  Towards next‐generation biodiversity assessment using DNA metabarcoding , 2012, Molecular ecology.

[58]  Derrick J. Zwickl Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion , 2006 .

[59]  Nancy Knowlton,et al.  DNA barcoding and metabarcoding of standardized samples reveal patterns of marine benthic diversity , 2015, Proceedings of the National Academy of Sciences.

[60]  J. Piñol,et al.  Universal and blocking primer mismatches limit the use of high‐throughput DNA sequencing for the quantitative metabarcoding of arthropods , 2015, Molecular ecology resources.

[61]  Preparation of Amplicon Libraries for Metabarcoding of Marine Eukaryotes Using Illumina MiSeq: The Dual-PCR Method. , 2016, Methods in molecular biology.

[62]  Jonathan P. Bollback,et al.  The Use of Coded PCR Primers Enables High-Throughput Sequencing of Multiple Homolog Amplification Products by 454 Parallel Sequencing , 2007, PloS one.