microDecon: A highly accurate read‐subtraction tool for the post‐sequencing removal of contamination in metabarcoding studies

Contamination is a ubiquitous problem in microbiome research and can skew results, especially when small amounts of target DNA are available. Nevertheless, no clear solution has emerged for removing microbial contamination. To address this problem, we developed the R package microDecon (https://github.com/donaldtmcknight/microDecon), which uses the proportions of contaminant operational taxonomic units (OTUs) or amplicon sequence variants (ASVs) in blank samples to systematically identify and remove contaminant reads from metabarcoding data sets. We rigorously tested microDecon using a series of computer simulations and a sequencing experiment. We also compared it to the common practice of simply removing all contaminant OTUs/ASVs and other methods for removing contamination. Both the computer simulations and our sequencing data confirmed the utility of microDecon. In our largest simulation (100,000 samples), using microDecon improved the results in 98.1% of samples. Additionally, in the sequencing data and in simulations involving groups, it enabled accurate clustering of groups as well as the detection of previously obscured patterns. It also produced more accurate results than the existing methods for identifying and removing contamination. These results demonstrate that microDecon effectively removes contamination across a broad range of situations. It should, therefore, be widely applicable to microbiome studies, as well as to metabarcoding studies in general.

[1]  P. Morris,et al.  Deriving accurate microbiota profiles from human samples with low bacterial content through post-sequencing processing of Illumina MiSeq data , 2015, Microbiome.

[2]  Hongwen Huang,et al.  Development and characterization of polymorphic microsatellite loci in endangered fern Adiantum reniforme var. sinense , 2006, Conservation Genetics.

[3]  Jiajie Zhang,et al.  PEAR: a fast and accurate Illumina Paired-End reAd mergeR , 2013, Bioinform..

[4]  Hua Shen,et al.  Sensitive, real-time PCR detects low-levels of contamination by Legionella pneumophila in commercial reagents. , 2006, Molecular and cellular probes.

[5]  J Paul Brooks,et al.  Challenges for case-control studies with microbiome data. , 2016, Annals of epidemiology.

[6]  R. Borrow,et al.  Contamination and Sensitivity Issues with a Real-Time Universal 16S rRNA PCR , 2000, Journal of Clinical Microbiology.

[7]  J. O’Hanlon,et al.  Analysis of Bacteria Contaminating Ultrapure Water in Industrial Systems , 2002, Applied and Environmental Microbiology.

[8]  Jonathan Crabtree,et al.  Distinguishing potential bacteria-tumor associations from contamination in a secondary data analysis of public cancer genome sequence data , 2017, Microbiome.

[9]  Rob Knight,et al.  UCHIME improves sensitivity and speed of chimera detection , 2011, Bioinform..

[10]  H. Morgan,et al.  Removal of contaminating DNA from polymerase chain reaction using ethidium monoazide. , 2007, Journal of microbiological methods.

[11]  Rohan S. Kulkarni,et al.  Enrichment of lung microbiome with supraglottic taxa is associated with increased pulmonary inflammation , 2013, Microbiome.

[12]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[13]  Tim Booth,et al.  PIPITS: an automated pipeline for analyses of fungal internal transcribed spacer sequences from the Illumina sequencing platform , 2015, Methods in ecology and evolution.

[14]  N. Taylor,et al.  DNA extraction from low-biomass carbonate rock: an improved method with reduced contamination and the low-biomass contaminant database. , 2006, Journal of microbiological methods.

[15]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[16]  Joshua N. Daly,et al.  Comparison of DNA Extraction Methods for Microbial Community Profiling with an Application to Pediatric Bronchoalveolar Lavage Samples , 2012, PloS one.

[17]  Lin Schwarzkopf,et al.  Methods for normalizing microbiome data: An ecological perspective , 2018, Methods in Ecology and Evolution.

[18]  D. Relman,et al.  Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data , 2017, Microbiome.

[19]  Satoshi Yamamoto,et al.  High-Coverage ITS Primers for the DNA-Based Identification of Ascomycetes and Basidiomycetes in Environmental Samples , 2012, PloS one.

[20]  P. Savelkoul,et al.  Detection of bacterial DNA in blood samples from febrile patients: underestimated infection or emerging contamination? , 2004, FEMS immunology and medical microbiology.

[21]  C. Vandenbroucke-Grauls,et al.  Removal of contaminating DNA from commercial nucleic acid extraction kit reagents. , 2005, Journal of microbiological methods.

[22]  Paul Turner,et al.  Reagent and laboratory contamination can critically impact sequence-based microbiome analyses , 2014, BMC Biology.

[23]  Thierry Grange,et al.  An Efficient Multistrategy DNA Decontamination Procedure of PCR Reagents for Hypersensitive PCR Applications , 2010, PloS one.

[24]  N. Zavaljevski,et al.  16S rRNA gene pyrosequencing of reference and clinical samples and investigation of the temperature stability of microbiome profiles , 2014, Microbiome.

[25]  Se Jin Song,et al.  Tracking down the sources of experimental contamination in microbiome studies , 2014, Genome Biology.

[26]  Andy F. S. Taylor,et al.  The UNITE database for molecular identification of fungi--recent updates and future perspectives. , 2010, The New phytologist.