Reintroducing mothur: 10 Years Later

More than 10 years ago, we published the paper describing the mothur software package in Applied and Environmental Microbiology. Our goal was to create a comprehensive package that allowed users to analyze amplicon sequence data using the most robust methods available. mothur has helped lead the community through the ongoing sequencing revolution and continues to provide this service to the microbial ecology community. ABSTRACT More than 10 years ago, we published the paper describing the mothur software package in Applied and Environmental Microbiology. Our goal was to create a comprehensive package that allowed users to analyze amplicon sequence data using the most robust methods available. mothur has helped lead the community through the ongoing sequencing revolution and continues to provide this service to the microbial ecology community. Beyond its success and impact on the field, mothur’s development exposed a series of observations that are generally translatable across science. Perhaps the observation that stands out the most is that all science is done in the context of prevailing ideas and available technologies. Although it is easy to criticize choices that were made 10 years ago through a modern lens, if we were to wait for all of the possible limitations to be solved before proceeding, science would stall. Even preceding the development of mothur, it was necessary to address the most important problems and work backwards to other problems that limited access to robust sequence analysis tools. At the same time, we strive to expand mothur’s capabilities in a data-driven manner to incorporate new ideas and accommodate changes in data and desires of the research community. It has been edifying to see the benefit that a simple set of tools can bring to so many other researchers.

[1]  R. Lenski,et al.  Experimental evolution and the dynamics of adaptation and genome evolution in microbial populations , 2017, The ISME Journal.

[2]  J. Handelsman,et al.  Introducing TreeClimber, a Test To Compare Microbial Community Structures , 2006, Applied and Environmental Microbiology.

[3]  J. W. Pendleton,et al.  Surveys of Gene Families Using Polymerase Chain Reaction: PCR Selection and PCR Drift , 1994 .

[4]  P. Schloss A High-Throughput DNA Sequence Aligner for Microbial Ecology Studies , 2009, PloS one.

[5]  A. Barbour,et al.  Discovery of the Lyme Disease Agent , 2019, mBio.

[6]  J. Forder,et al.  A versatile oxygenator and perfusion system for magnetic resonance studies. , 2010, Biotechnology and bioengineering.

[7]  Mingxun Wang,et al.  Qiita: rapid, web-enabled microbiome meta-analysis , 2018, Nature Methods.

[8]  Susan M. Huse,et al.  Microbial diversity in the deep sea and the underexplored “rare biosphere” , 2006, Proceedings of the National Academy of Sciences.

[9]  Martin F. Polz,et al.  Bias in Template-to-Product Ratios in Multitemplate PCR , 1998, Applied and Environmental Microbiology.

[10]  Daniel G. Brown,et al.  PANDAseq: paired-end assembler for illumina sequences , 2012, BMC Bioinformatics.

[11]  Patrick D Schloss,et al.  Application of a Database-Independent Approach To Assess the Quality of Operational Taxonomic Unit Picking Methods , 2016, mSystems.

[12]  J. Tiedje,et al.  Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy , 2007, Applied and Environmental Microbiology.

[13]  F. Brockman,et al.  Effect of PCR template concentration on the composition and distribution of total community 16S rDNA clone libraries , 1997, Molecular ecology.

[14]  Victor Seguritan,et al.  FastGroup: A program to dereplicate libraries of 16S rDNA sequences , 2001, BMC Bioinformatics.

[15]  M. P. Cummings PHYLIP (Phylogeny Inference Package) , 2004 .

[16]  Nicholas A. Bokulich,et al.  Quality-filtering vastly improves diversity estimates from Illumina amplicon sequencing , 2012, Nature Methods.

[17]  C. Quince,et al.  Accurate determination of microbial diversity from 454 pyrosequencing data , 2009, Nature Methods.

[18]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[19]  J. Tiedje,et al.  DNA recovery from soils of diverse composition , 1996, Applied and environmental microbiology.

[20]  R. Knight,et al.  Quantitative and Qualitative β Diversity Measures Lead to Different Insights into Factors That Structure Microbial Communities , 2007, Applied and Environmental Microbiology.

[21]  Thomas Huber,et al.  Chimeric 16S rDNA sequences of diverse origin are accumulating in the public databases. , 2003, International journal of systematic and evolutionary microbiology.

[22]  Paul Turner,et al.  Reagent and laboratory contamination can critically impact sequence-based microbiome analyses , 2014, BMC Biology.

[23]  William A. Walters,et al.  Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample , 2010, Proceedings of the National Academy of Sciences.

[24]  Patrick D. Schloss,et al.  Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system , 2016, PeerJ.

[25]  J. Hughes,et al.  Counting the Uncountable: Statistical Approaches to Estimating Microbial Diversity , 2001, Applied and Environmental Microbiology.

[26]  Rob Knight,et al.  Advancing our understanding of the human microbiome using QIIME. , 2013, Methods in enzymology.

[27]  P. Schloss,et al.  Microbiota-based model improves the sensitivity of fecal immunochemical test for detecting colonic lesions , 2016, Genome Medicine.

[28]  Robert K. Colwell,et al.  EstimateS turns 20: statistical estimation of species richness and shared species from samples, with non‐parametric extrapolation , 2014 .

[29]  Sarah L. Westcott,et al.  Development of a Dual-Index Sequencing Strategy and Curation Pipeline for Analyzing Amplicon Sequence Data on the MiSeq Illumina Sequencing Platform , 2013, Applied and Environmental Microbiology.

[30]  Russell J. Davenport,et al.  Removing Noise From Pyrosequenced Amplicons , 2011, BMC Bioinformatics.

[31]  T. Rognes,et al.  Swarm v2: highly-scalable and high-resolution amplicon clustering , 2015, PeerJ.

[32]  Patrick D. Schloss,et al.  Assessing and Improving Methods Used in Operational Taxonomic Unit-Based Approaches for 16S rRNA Gene Sequence Analysis , 2011, Applied and Environmental Microbiology.

[33]  B. Haas,et al.  Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. , 2011, Genome research.

[34]  Patrick D Schloss,et al.  Identifying and Overcoming Threats to Reproducibility, Replicability, Robustness, and Generalizability in Microbiome Research , 2018, mBio.

[35]  Francesco Asnicar,et al.  Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2 , 2019, Nature Biotechnology.

[36]  Jason A. Papin,et al.  Ten simple rules for biologists learning to program , 2018, PLoS Comput. Biol..

[37]  Sarah L. Westcott,et al.  De novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units , 2015, PeerJ.

[38]  G. Wang,et al.  Frequency of formation of chimeric molecules as a consequence of PCR coamplification of 16S rRNA genes from mixed bacterial genomes , 1997, Applied and environmental microbiology.

[39]  M S Waterman,et al.  A new computational method for detection of chimeric 16S rRNA artifacts generated by PCR amplification from mixed bacterial populations , 1997, Applied and environmental microbiology.

[40]  P. Schloss Secondary structure improves OTU assignments of 16S rRNA gene sequences , 2012, The ISME Journal.

[41]  Jizhong Zhou,et al.  Evaluation of PCR-Generated Chimeras, Mutations, and Heteroduplexes with 16S rRNA Gene-Based Cloning , 2001, Applied and Environmental Microbiology.

[42]  Pelin Yilmaz,et al.  The SILVA and “All-species Living Tree Project (LTP)” taxonomic frameworks , 2013, Nucleic Acids Res..

[43]  David K. Smith,et al.  From fundamental supramolecular chemistry to self-assembled nanomaterials and medicines and back again - how Sam inspired SAMul. , 2018, Chemical communications.

[44]  R. Knight,et al.  UniFrac: a New Phylogenetic Method for Comparing Microbial Communities , 2005, Applied and Environmental Microbiology.

[45]  Antonio Gonzalez,et al.  Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences , 2014, PeerJ.

[46]  Patrick D Schloss,et al.  The Riffomonas Reproducible Research Tutorial Series , 2018, The Journal of open source education.

[47]  Patrick D Schloss,et al.  Evaluating different approaches that test whether microbial communities have the same structure , 2008, The ISME Journal.

[48]  Tim Tolker-Nielsen,et al.  Biased 16S rDNA PCR amplification caused by interference from DNA flanking the template region , 1998 .

[49]  Rob Knight,et al.  UCHIME improves sensitivity and speed of chimera detection , 2011, Bioinform..

[50]  Philip Hugenholtz,et al.  NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes , 2006, Nucleic Acids Res..

[51]  Stephen L. Rathbun,et al.  Quantitative Comparisons of 16S rRNA Gene Sequence Libraries from Environmental Samples , 2001, Applied and Environmental Microbiology.

[52]  Jo Handelsman,et al.  Toward a Census of Bacteria in Soil , 2006, PLoS Comput. Biol..

[53]  Patrick D Schloss,et al.  OptiClust, an Improved Method for Assigning Amplicon-Based Sequence Data to Operational Taxonomic Units , 2017, mSphere.

[54]  M. Watson,et al.  The Madness of Microbiome: Attempting To Find Consensus “Best Practice” for 16S Microbiome Studies , 2018, Applied and Environmental Microbiology.

[55]  Rob Knight,et al.  EMPeror: a tool for visualizing high-throughput microbial community data , 2013, GigaScience.

[56]  Erko Stackebrandt,et al.  Taxonomic Note: A Place for DNA-DNA Reassociation and 16S rRNA Sequence Analysis in the Present Species Definition in Bacteriology , 1994 .

[57]  Szymon T Calus,et al.  NanoAmpli-Seq: a workflow for amplicon sequencing for mixed microbial communities on the nanopore sequencing platform , 2018, bioRxiv.

[58]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[59]  J. Handelsman,et al.  Status of the Microbial Census , 2004, Microbiology and Molecular Biology Reviews.

[60]  Paul J. McMurdie,et al.  Exact sequence variants should replace operational taxonomic units in marker-gene data analysis , 2017, The ISME Journal.

[61]  D. Relman,et al.  Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data , 2017, Microbiome.

[62]  J. Handelsman,et al.  Introducing SONS, a Tool for Operational Taxonomic Unit-Based Comparisons of Microbial Community Memberships and Structures , 2006, Applied and Environmental Microbiology.

[63]  S. Giovannoni,et al.  Bias caused by template annealing in the amplification of mixtures of 16S rRNA genes by PCR , 1996, Applied and environmental microbiology.

[64]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[65]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[66]  J. Handelsman,et al.  Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness , 2005, Applied and Environmental Microbiology.

[67]  K. Schleifer,et al.  ARB: a software environment for sequence data. , 2004, Nucleic acids research.

[68]  James R. Cole,et al.  Ribosomal Database Project: data and tools for high throughput rRNA analysis , 2013, Nucleic Acids Res..

[69]  Patrick D. Schloss,et al.  The Effects of Alignment Quality, Distance Calculation Method, Sequence Filtering, and Region on the Analysis of 16S rRNA Gene-Based Studies , 2010, PLoS Comput. Biol..

[70]  J. Bonfield,et al.  A new DNA sequence assembly program. , 1995, Nucleic acids research.

[71]  Dirk Eddelbuettel,et al.  Rcpp: Seamless R and C++ Integration , 2011 .

[72]  J. Prosser,et al.  Molecular Analysis of Bacterial Community Structure and Diversity in Unimproved and Improved Upland Grass Pastures , 1999, Applied and Environmental Microbiology.

[73]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[74]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[75]  Arturo Casadevall,et al.  (A)Historical Science , 2015, Infection and Immunity.

[76]  Eoin L. Brodie,et al.  Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB , 2006, Applied and Environmental Microbiology.

[77]  Patrick D. Schloss,et al.  Reducing the Effects of PCR Amplification and Sequencing Artifacts on 16S rRNA-Based Studies , 2011, PloS one.

[78]  Pelin Yilmaz,et al.  25 years of serving the community with ribosomal RNA gene reference databases and tools. , 2017, Journal of biotechnology.

[79]  Paul J. McMurdie,et al.  DADA2: High resolution sample inference from Illumina amplicon data , 2016, Nature Methods.

[80]  Ben Nichols,et al.  VSEARCH: a versatile open source tool for metagenomics , 2016, PeerJ.

[81]  Jo Handelsman,et al.  Integration of Microbial Ecology and Statistics: a Test To Compare Gene Libraries , 2004, Applied and Environmental Microbiology.

[82]  J. Handelsman,et al.  The last word: books as a statistical metaphor for microbial communities. , 2007, Annual review of microbiology.