SoftwareW.A.T.E.R.S.: a Workflow for the Alignment, Taxonomy, and Ecology of Ribosomal Sequences

Background: For more than two decades microbiologists have used a highly conserved microbial gene as a phylogenetic marker for bacteria and archaea. The small-subunit ribosomal RNA gene, also known as 16 S rRNA, is encoded by ribosomal DNA, 16 S rDNA, and has provided a powerful comparative tool to microbial ecologists. Over time, the microbial ecology field has matured from small-scale studies in a select number of environments to massive collections of sequence data that are paired with dozens of corresponding collection variables. As the complexity of data and tool sets have grown, the need for flexible automation and maintenance of the core processes of 16 S rDNA sequence analysis has increased correspondingly. Results: We present WATERS, an integrated approach for 16 S rDNA analysis that bundles a suite of publicly available 16 S rDNA analysis software tools into a single software package. The "toolkit" includes sequence alignment, chimera removal, OTU determination, taxonomy assignment, phylogentic tree construction as well as a host of ecological analysis and visualization tools. WATERS employs a flexible, collection-oriented 'workflow' approach using the opensource Kepler system as a platform. Conclusions: By packaging available software tools into a single automated workflow, WATERS simplifies 16 S rDNA analyses, especially for those without specialized bioinformatics, programming expertise. In addition, WATERS, like some of the newer comprehensive rRNA analysis tools, allows researchers to minimize the time dedicated to carrying out tedious informatics steps and to focus their attention instead on the biological interpretation of the results. One advantage of WATERS over other comprehensive tools is that the use of the Kepler workflow system facilitates result interpretation and reproducibility via a data provenance sub-system. Furthermore, new "actors" can be added to the workflow as desired and we see WATERS as an initial seed for a sizeable and growing repository of interoperable, easyto-combine tools for asking increasingly complex microbial ecology questions.

[1]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[2]  Bertram Ludäscher,et al.  A navigation model for exploring scientific workflow provenance graphs , 2009, WORKS '09.

[3]  Bertram Ludäscher,et al.  Scientific workflow design with data assembly lines , 2009, WORKS '09.

[4]  Jonathan A. Eisen,et al.  Human gut microbiome adopts an alternative state following small bowel transplantation , 2009, Proceedings of the National Academy of Sciences.

[5]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[6]  James Versalovic,et al.  National Institutes of Health Gastrointestinal Microbiota and Advances in Prebiotic and Probiotic Research conference summary. , 2009, Gastroenterology.

[7]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[8]  Bertram Ludäscher,et al.  Scientific workflow design for mere mortals , 2009, Future Gener. Comput. Syst..

[9]  Adam P. Arkin,et al.  FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix , 2009, Molecular biology and evolution.

[10]  G. Macfarlane,et al.  Intestinal bacteria and inflammatory bowel disease , 2009, Critical reviews in clinical laboratory sciences.

[11]  B. Roe,et al.  A core gut microbiome in obese and lean twins , 2008, Nature.

[12]  Scott Klasky,et al.  Scientific Process Automation and Workflow Management , 2009, Scientific Data Management.

[13]  Bertram Ludäscher,et al.  Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life , 2008, IPAW.

[14]  J. Rougemont,et al.  A rapid bootstrap algorithm for the RAxML Web servers. , 2008, Systematic biology.

[15]  Renzo Kottmann,et al.  A standard operating procedure for phylogenetic inference (SOPPI) using (rRNA) marker genes. , 2008, Systematic and applied microbiology.

[16]  Jonathan A. Eisen,et al.  Correction: An Automated Phylogenetic Tree-Based Small Subunit rRNA Taxonomy and Alignment Pipeline (STAP) , 2008, PLoS ONE.

[17]  J. Eisen,et al.  An Automated Phylogenetic Tree-Based Small Subunit rRNA Taxonomy and Alignment Pipeline (STAP) , 2008, PloS one.

[18]  Cláudio T. Silva,et al.  Provenance for Computational Tasks: A Survey , 2008, Computing in Science & Engineering.

[19]  Wolf-Dietrich Hardt,et al.  The role of microbiota in infectious disease. , 2008, Trends in microbiology.

[20]  Dennis Gannon,et al.  Workflows for e-Science, Scientific Workflows for Grids , 2014 .

[21]  W. Ludwig,et al.  SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB , 2007, Nucleic acids research.

[22]  Bertram Ludäscher,et al.  Project Histories: Managing Data Provenance Across Collection-Oriented Scientific Workflow Runs , 2007, DILS.

[23]  Bruce A. Hungate,et al.  Altered soil microbial community at elevated CO2 leads to loss of soil carbon , 2007, Proceedings of the National Academy of Sciences.

[24]  Bertram Ludäscher,et al.  Provenance in Scientific Workflow Systems , 2007, IEEE Data Eng. Bull..

[25]  James R. Cole,et al.  The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public data , 2006, Nucleic Acids Res..

[26]  P. Turnbaugh,et al.  Microbial ecology: Human gut microbes associated with obesity , 2006, Nature.

[27]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[28]  A. J. Jones,et al.  New Screening Software Shows that Most Recent Large 16S rRNA Gene Clone Libraries Contain Chimeras , 2006, Applied and Environmental Microbiology.

[29]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[30]  Pamela A. Matson,et al.  The Influence of Tropical Plant Diversity and Composition on Soil Microbial Communities , 2006, Microbial Ecology.

[31]  Alison E. Murray,et al.  Comparative Genomics of DNA Fragments from Six Antarctic Marine Planktonic Bacteria , 2006, Applied and Environmental Microbiology.

[32]  Eoin L. Brodie,et al.  Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB , 2006, Applied and Environmental Microbiology.

[33]  E. Delong,et al.  Community Genomics Among Stratified Microbial Assemblages in the Ocean's Interior , 2006, Science.

[34]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[35]  A. J. Jones,et al.  At Least 1 in 20 16S rRNA Sequence Records Currently Held in Public Repositories Is Estimated To Contain Substantial Anomalies , 2005, Applied and Environmental Microbiology.

[36]  R. Knight,et al.  UniFrac: a New Phylogenetic Method for Comparing Microbial Communities , 2005, Applied and Environmental Microbiology.

[37]  Shawn Bowers,et al.  An approach for pipelining nested collections in scientific workflows , 2005, SGMD.

[38]  E. Purdom,et al.  Diversity of the Human Intestinal Microbial Flora , 2005, Science.

[39]  Alok J. Saldanha,et al.  Java Treeview - extensible visualization of microarray data , 2004, Bioinform..

[40]  Thomas Huber,et al.  Bellerophon: a program to detect chimeric sequences in multiple sequence alignments , 2004, Bioinform..

[41]  O. White,et al.  Environmental Genome Shotgun Sequencing of the Sargasso Sea , 2004, Science.

[42]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[43]  Jean Charles Munch,et al.  Influence of precision farming on the microbial community structure and functions in nitrogen turnover , 2003 .

[44]  Alex Bateman,et al.  QuickTree: building huge Neighbour-Joining trees of protein sequences , 2002, Bioinform..

[45]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[46]  N. Pace A molecular view of microbial diversity and the biosphere. , 1997, Science.

[47]  E. Delong,et al.  Analysis of a marine picoplankton community by 16S rRNA gene cloning and sequencing , 1991, Journal of bacteriology.

[48]  K. Wilson,et al.  Amplification of bacterial 16S ribosomal DNA with polymerase chain reaction , 1990, Journal of clinical microbiology.

[49]  M. Sogin,et al.  The characterization of enzymatically amplified eukaryotic 16S-like rRNA-coding regions. , 1988, Gene.

[50]  G J Olsen,et al.  Sulfur-oxidizing bacterial endosymbionts: analysis of phylogeny and specificity by 16S rRNA sequences , 1988, Journal of bacteriology.

[51]  T. Macke,et al.  A phylogenetic definition of the major eubacterial taxa. , 1985, Systematic and applied microbiology.

[52]  R. Gutell,et al.  Comparative anatomy of 16-S-like ribosomal RNA. , 1985, Progress in nucleic acid research and molecular biology.

[53]  A. Chao Nonparametric estimation of the number of classes in a population , 1984 .