An independent evaluation in a CRC patient cohort of microbiome 16S rRNA sequence analysis methods: OTU clustering, DADA2, and Deblur

16S rRNA is the universal gene of microbes, and it is often used as a target gene to obtain profiles of microbial communities via next-generation sequencing (NGS) technology. Traditionally, sequences are clustered into operational taxonomic units (OTUs) at a 97% threshold based on the taxonomic standard using 16S rRNA, and methods for the reduction of sequencing errors are bypassed, which may lead to false classification units. Several denoising algorithms have been published to solve this problem, such as DADA2 and Deblur, which can correct sequencing errors at single-nucleotide resolution by generating amplicon sequence variants (ASVs). As high-resolution ASVs are becoming more popular than OTUs and only one analysis method is usually selected in a particular study, there is a need for a thorough comparison of OTU clustering and denoising pipelines. In this study, three of the most widely used 16S rRNA methods (two denoising algorithms, DADA2 and Deblur, along with de novo OTU clustering) were thoroughly compared using 16S rRNA amplification sequencing data generated from 358 clinical stool samples from the Colorectal Cancer (CRC) Screening Cohort. Our findings indicated that all approaches led to similar taxonomic profiles (with P > 0.05 in PERMNAOVA and P <0.001 in the Mantel test), although the number of ASVs/OTUs and the alpha-diversity indices varied considerably. Despite considerable differences in disease-related markers identified, disease-related analysis showed that all methods could result in similar conclusions. Fusobacterium, Streptococcus, Peptostreptococcus, Parvimonas, Gemella, and Haemophilus were identified by all three methods as enriched in the CRC group, while Roseburia, Faecalibacterium, Butyricicoccus, and Blautia were identified by all three methods as enriched in the healthy group. In addition, disease-diagnostic models generated using machine learning algorithms based on the data from these different methods all achieved good diagnostic efficiency (AUC: 0.87–0.89), with the model based on DADA2 producing the highest AUC (0.8944 and 0.8907 in the training set and test set, respectively). However, there was no significant difference in performance between the models (P >0.05). In conclusion, this study demonstrates that DADA2, Deblur, and de novo OTU clustering display similar power levels in taxa assignment and can produce similar conclusions in the case of the CRC cohort.

[1]  S. Srikumar,et al.  16S rRNA Based Profiling of Bacterial Communities Colonizing Bakery-Production Environments. , 2022, Foodborne pathogens and disease.

[2]  Xinxiang Li,et al.  Dysbiosis of human gut microbiome in young-onset colorectal cancer , 2021, Nature Communications.

[3]  Z. Soons,et al.  Gut microbiota and short‐chain fatty acid alterations in cachectic cancer patients , 2021, Journal of cachexia, sarcopenia and muscle.

[4]  H. Yoon,et al.  Fecal Microbiota and Gut Microbe-Derived Extracellular Vesicles in Colorectal Cancer , 2021, Frontiers in Oncology.

[5]  P. Solanki,et al.  Gut microbiota-derived metabolites in CRC progression and causation , 2021, Journal of Cancer Research and Clinical Oncology.

[6]  Pengfei Xu,et al.  Global colorectal cancer burden in 2020 and projections to 2040 , 2021, Translational oncology.

[7]  M. Tani,et al.  The Comparison of Fecal Microbiota in Left-Side and Right-Side Human Colorectal Cancer , 2021, European Surgical Research.

[8]  Benjamin D. Kaehler,et al.  Beating Naive Bayes at Taxonomic Classification of 16S rRNA Gene Sequences , 2021, Frontiers in Microbiology.

[9]  Charles J. Kahi,et al.  ACG Clinical Guidelines: Colorectal Cancer Screening 2021. , 2021, The American journal of gastroenterology.

[10]  Wei Chen,et al.  Blautia—a new functional genus with potential probiotic properties? , 2021, Gut microbes.

[11]  Fangfang Guo,et al.  F. nucleatum targets lncRNA ENO1-IT1 to promote glycolysis and oncogenesis in colorectal cancer , 2020, Gut.

[12]  Chi-Jung Huang,et al.  A gut butyrate-producing bacterium Butyricicoccus pullicaecorum regulates short-chain fatty acid transporter and receptor to reduce the progression of 1,2-dimethylhydrazine-associated colorectal cancer , 2020, Oncology letters.

[13]  K. Zhao,et al.  Potential role of fecal microbiota in patients with constipation , 2020, Therapeutic advances in gastroenterology.

[14]  R. Palmqvist,et al.  Parvimonas micra as a putative non-invasive faecal biomarker for colorectal cancer , 2020, Scientific Reports.

[15]  Ruixin Zhu,et al.  Identification of microbial markers across populations in early detection of colorectal cancer , 2020, Nature Communications.

[16]  Dong-yan Wang,et al.  Alteration of the abundance of Parvimonas micra in the gut along the adenoma-carcinoma sequence , 2020, Oncology letters.

[17]  Liangjing Wang,et al.  Fusobacterium nucleatum promotes colorectal cancer metastasis by modulating KRT7-AS/KRT7 , 2020, Gut microbes.

[18]  E. Stoffel,et al.  Epidemiology and Mechanisms of the Increasing Incidence of Colon and Rectal Cancers in Young Adults. , 2020, Gastroenterology.

[19]  Jun Yu,et al.  A novel faecal Lachnoclostridium marker for the non-invasive diagnosis of colorectal adenoma and cancer , 2019, Gut.

[20]  Benjamin D. Kaehler,et al.  Species abundance information improves sequence taxonomy classification accuracy , 2019, Nature Communications.

[21]  N. Qin,et al.  Correlation of diet, microbiota and metabolite networks in inflammatory bowel disease , 2019, Journal of digestive diseases.

[22]  Jun Yu,et al.  Peptostreptococcus anaerobius promotes colorectal carcinogenesis and modulates tumour immunity , 2019, Nature Microbiology.

[23]  Tomoyoshi Soga,et al.  Metagenomic and metabolomic analyses reveal distinct stage-specific phenotypes of the gut microbiota in colorectal cancer , 2019, Nature Medicine.

[24]  S. A. Boers,et al.  Understanding and overcoming the pitfalls and biases of next-generation sequencing (NGS) methods for use in the routine clinical microbiological diagnostic laboratory , 2019, European Journal of Clinical Microbiology & Infectious Diseases.

[25]  Varun Bhat,et al.  Streptococcus gallolyticus Group Bacteremia and Colonic Adenocarcinoma , 2019, The Journal of the American Osteopathic Association.

[26]  W. Wong,et al.  Metabolomics and 16S rRNA sequencing of human colorectal cancers and adjacent mucosa , 2018, PloS one.

[27]  W. Garrett,et al.  Fusobacterium nucleatum — symbiont, opportunist and oncobacterium , 2018, Nature Reviews Microbiology.

[28]  R. Marrie,et al.  A comparative study of the gut microbiota in immune-mediated inflammatory diseases—does a common dysbiosis exist? , 2018, Microbiome.

[29]  J. Diaz-Tasende Colorectal cancer screening and survival. , 2018, Revista espanola de enfermedades digestivas : organo oficial de la Sociedad Espanola de Patologia Digestiva.

[30]  Shuwen Han,et al.  Intestinal microorganisms involved in colorectal cancer complicated with dyslipidosis , 2018, Cancer biology & therapy.

[31]  Xiaotian Chen,et al.  Faecalibacterium prausnitzii Produces Butyrate to Maintain Th17/Treg Balance and to Ameliorate Colorectal Colitis by Inhibiting Histone Deacetylase 1. , 2018, Inflammatory bowel diseases.

[32]  Gary Tse,et al.  Association Between Bacteremia From Specific Microbes and Subsequent Diagnosis of Colorectal Cancer. , 2018, Gastroenterology.

[33]  Benjamin D. Kaehler,et al.  Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin , 2018, Microbiome.

[34]  Jacob T. Nearing,et al.  Denoising the Denoisers: an independent evaluation of microbiome sequence error-correction approaches , 2018, PeerJ.

[35]  Benjamin D. Kaehler,et al.  Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin , 2018, Microbiome.

[36]  Donna Neuberg,et al.  Analysis of Fusobacterium persistence and antibiotic response in colorectal cancer , 2017, Science.

[37]  Hongyu Zhao,et al.  Variable importance-weighted random forests , 2017, Quantitative Biology.

[38]  H. Ashktorab,et al.  Racial Disparity in Gastrointestinal Cancer Risk. , 2017, Gastroenterology.

[39]  Yongzheng Peng,et al.  Phascolarctobacterium faecium abundant colonization in human gastrointestinal tract. , 2017, Experimental and therapeutic medicine.

[40]  Robert C. Edgar,et al.  Updating the 97% identity threshold for 16S ribosomal RNA OTUs , 2017, bioRxiv.

[41]  J. Roach,et al.  A comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the gut microbiome , 2017, BMC Microbiology.

[42]  Kang Li,et al.  Metabolomics for biomarker discovery in the diagnosis, prognosis, survival and recurrence of colorectal cancer: a systematic review , 2017, Oncotarget.

[43]  Jose A Navas-Molina,et al.  Deblur Rapidly Resolves Single-Nucleotide Community Sequence Patterns , 2017, mSystems.

[44]  S. Duncan,et al.  Faecalibacterium prausnitzii: from microbiology to diagnostics and prognostics , 2017, The ISME Journal.

[45]  Ben Nichols,et al.  VSEARCH: a versatile open source tool for metagenomics , 2016, PeerJ.

[46]  Paul J. McMurdie,et al.  DADA2: High resolution sample inference from Illumina amplicon data , 2016, Nature Methods.

[47]  E. Plummer,et al.  A Comparison of Three Bioinformatics Pipelines for the Analysis ofPreterm Gut Microbiota using 16S rRNA Gene Sequencing Data , 2015 .

[48]  Sarah L. Westcott,et al.  De novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units , 2015, PeerJ.

[49]  Qiang Feng,et al.  Gut microbiome development along the colorectal adenoma–carcinoma sequence , 2015, Nature Communications.

[50]  É. Yergeau,et al.  Next-generation Sequencing of 16S Ribosomal RNA Gene Amplicons , 2014, Journal of visualized experiments : JoVE.

[51]  K. Schleifer,et al.  Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences , 2014, Nature Reviews Microbiology.

[52]  Hermann Brenner,et al.  Colorectal cancer , 2014, The Lancet.

[53]  Jan Verhaegen,et al.  A decrease of the butyrate-producing species Roseburia hominis and Faecalibacterium prausnitzii defines dysbiosis in patients with ulcerative colitis , 2013, Gut.

[54]  Robert C. Edgar,et al.  UPARSE: highly accurate OTU sequences from microbial amplicon reads , 2013, Nature Methods.

[55]  H. Sokol,et al.  Faecalibacterium prausnitzii and human intestinal health. , 2013, Current opinion in microbiology.

[56]  V. Kunin,et al.  Effects of OTU Clustering and PCR Artifacts on Microbial Diversity Estimates , 2013, Microbial Ecology.

[57]  B. Birren,et al.  Genomic analysis identifies association of Fusobacterium with colorectal carcinoma. , 2012, Genome research.

[58]  Richard A. Moore,et al.  Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma. , 2012, Genome research.

[59]  Patrick D. Schloss,et al.  Reducing the Effects of PCR Amplification and Sequencing Artifacts on 16S rRNA-Based Studies , 2011, PloS one.

[60]  Liping Zhao,et al.  Structural segregation of gut microbiota between colorectal cancer patients and healthy volunteers , 2011, The ISME Journal.

[61]  C. Huttenhower,et al.  Metagenomic biomarker discovery and explanation , 2011, Genome Biology.

[62]  T. Fennell,et al.  Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries , 2011, Genome Biology.

[63]  V. Kunin,et al.  Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. , 2009, Environmental microbiology.

[64]  Martin A. Nowak,et al.  Comparative lesion sequencing provides insights into tumor evolution , 2008, Proceedings of the National Academy of Sciences.

[65]  W. Ludwig,et al.  SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB , 2007, Nucleic acids research.

[66]  N. Pace,et al.  Molecular-phylogenetic characterization of microbial community imbalances in human inflammatory bowel diseases , 2007, Proceedings of the National Academy of Sciences.

[67]  J. Garcia-Gil,et al.  Abnormal microbiota composition in the ileocolonic mucosa of Crohn's disease patients as revealed by polymerase chain reaction‐denaturing gradient gel electrophoresis , 2006, Inflammatory bowel diseases.

[68]  Tom Coenye,et al.  Intragenomic heterogeneity between multiple 16S ribosomal RNA operons in sequenced bacterial genomes. , 2003, FEMS microbiology letters.

[69]  B. Vogelstein,et al.  A genetic model for colorectal tumorigenesis , 1990, Cell.

[70]  D. Bauer Constructing Confidence Sets Using Rank Statistics , 1972 .

[71]  N. Qin,et al.  Diet-microbiota-metabolite interaction networks reveal key players in inflammatory bowel disease. , 2019, Journal of digestive diseases.

[72]  H. Wong,et al.  Streptococcus gallolyticus Bacteremia and Colorectal Carcinoma. , 2019, Gastroenterology.

[73]  Jacob T. Nearing,et al.  Denoising the Denoisers: An independent evaluation of microbiome sequence error-correction methods , 2018 .

[74]  Michael W. Hall,et al.  16S rRNA Gene Analysis with QIIME2. , 2018, Methods in molecular biology.