Microbial Community Analysis with Ribosomal Gene Fragments from Shotgun Metagenomes

ABSTRACT Shotgun metagenomic sequencing does not depend on gene-targeted primers or PCR amplification; thus, it is not affected by primer bias or chimeras. However, searching rRNA genes from large shotgun Illumina data sets is computationally expensive, and no approach exists for unsupervised community analysis of small-subunit (SSU) rRNA gene fragments retrieved from shotgun data. We present a pipeline, SSUsearch, to achieve the faster identification of short-subunit rRNA gene fragments and enabled unsupervised community analysis with shotgun data. It also includes classification and copy number correction, and the output can be used by traditional amplicon analysis platforms. Shotgun metagenome data using this pipeline yielded higher diversity estimates than amplicon data but retained the grouping of samples in ordination analyses. We applied this pipeline to soil samples with paired shotgun and amplicon data and confirmed bias against Verrucomicrobia in a commonly used V6-V8 primer set, as well as discovering likely bias against Actinobacteria and for Verrucomicrobia in a commonly used V4 primer set. This pipeline can utilize all variable regions in SSU rRNA and also can be applied to large-subunit (LSU) rRNA genes for confirmation of community structure. The pipeline can scale to handle large amounts of soil metagenomic data (5 Gb memory and 5 central processing unit hours to process 38 Gb [1 lane] of trimmed Illumina HiSeq2500 data) and is freely available at https://github.com/dib-lab/SSUsearch under a BSD license.

[1]  J. Fuhrman,et al.  Every base matters: assessing small subunit rRNA primers for marine microbiomes with mock communities, time series and global field samples. , 2016, Environmental microbiology.

[2]  J. Tiedje,et al.  Influence of corn, switchgrass, and prairie cropping systems on soil microbial communities in the upper Midwest of the United States , 2016 .

[3]  S. Tringe,et al.  High-Throughput Metagenomic Technologies for Complex Microbial Community Analysis: Open and Closed Formats , 2015, mBio.

[4]  Francisco M. Cornejo-Castillo,et al.  Metagenomic 16S rDNA Illumina tags are a powerful alternative to amplicon sequencing to explore diversity and structure of microbial communities. , 2014, Environmental microbiology.

[5]  Florent E. Angly,et al.  CopyRighter: a rapid tool for improving the accuracy of microbial community profiles through lineage-specific gene copy number correction , 2014, Microbiome.

[6]  Holly M. Bik,et al.  PhyloSift: phylogenetic analysis of genomes and metagenomes , 2014, PeerJ.

[7]  Jizhong Zhou,et al.  Soil Microbial Community Responses to a Decade of Warming as Revealed by Comparative Metagenomics , 2013, Applied and Environmental Microbiology.

[8]  James R. Cole,et al.  Ribosomal Database Project: data and tools for high throughput rRNA analysis , 2013, Nucleic Acids Res..

[9]  C. Kuske,et al.  From Genus to Phylum: Large-Subunit and Internal Transcribed Spacer rRNA Operon Regions Show Similar Classification Accuracies Influenced by Database Composition , 2013, Applied and Environmental Microbiology.

[10]  Jens Roat Kultima,et al.  Metagenomic species profiling using universal phylogenetic marker genes , 2013, Nature Methods.

[11]  Tong Zhang,et al.  Taxonomic Precision of Different Hypervariable Regions of 16S rRNA Gene and Annotation Methods for Functional Bacterial Groups in Biological Wastewater Treatment , 2013, PloS one.

[12]  C. Hawkes,et al.  Differences in fungal and bacterial physiology alter soil carbon and nitrogen cycling: insights from meta-analysis and theoretical models. , 2013, Ecology letters.

[13]  C. Quince,et al.  Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities. , 2013, Environmental microbiology.

[14]  Kessy Abarenkov,et al.  Fungal community analysis by high-throughput sequencing of amplified markers – a user's guide , 2013, The New phytologist.

[15]  Pelin Yilmaz,et al.  The SILVA ribosomal RNA gene database project: improved data processing and web-based tools , 2012, Nucleic Acids Res..

[16]  A. Klindworth,et al.  Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies , 2012, Nucleic acids research.

[17]  Robert C. Edgar,et al.  Defining the core Arabidopsis thaliana root microbiome , 2012, Nature.

[18]  Dan-Ping Mao,et al.  Coverage evaluation of universal bacterial primers using the metagenomic datasets , 2012, BMC Microbiology.

[19]  Teresita M. Porter,et al.  Factors That Affect Large Subunit Ribosomal DNA Amplicon Sequencing Studies of Fungal Communities: Classification Method, Primer Choice, and Error , 2012, PloS one.

[20]  William A. Walters,et al.  Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms , 2012, The ISME Journal.

[21]  Kuan-Liang Liu,et al.  Accurate, Rapid Taxonomic Classification of Fungal Large-Subunit rRNA Genes , 2011, Applied and Environmental Microbiology.

[22]  Robert A. Edwards,et al.  Identification and removal of ribosomal RNA sequences from metatranscriptomes , 2011, Bioinform..

[23]  Rob Knight,et al.  Using QIIME to Analyze 16S rRNA Gene Sequences from Microbial Communities , 2011, Current protocols in bioinformatics.

[24]  G. Bronner,et al.  Comparison of 16S rRNA and protein-coding genes as molecular markers for assessing microbial diversity (Bacteria and Archaea) in ecosystems. , 2011, FEMS microbiology ecology.

[25]  Tanja Magoc,et al.  FLASH: fast length adjustment of short reads to improve genome assemblies , 2011, Bioinform..

[26]  Jae-Hak Lee,et al.  rRNASelector: A computer program for selecting ribosomal RNA encoding sequences from metagenomic and metatranscriptomic shotgun libraries , 2011, The Journal of Microbiology.

[27]  Scott T. Bates,et al.  The under-recognized dominance of Verrucomicrobia in soil bacterial communities. , 2011, Soil biology & biochemistry.

[28]  M. Hartmann,et al.  Metaxa: a software tool for automated detection and discrimination among ribosomal small subunit (12S/16S/18S) sequences of archaea, bacteria, eukaryotes, mitochondria, and chloroplasts in metagenomes and environmental sequencing datasets , 2011, Antonie van Leeuwenhoek.

[29]  Folker Meyer,et al.  37. The Metagenomics RAST Server: A Public Resource for the Automatic Phylogenetic and Functional Analysis of Metagenomes , 2011 .

[30]  Patrick D. Schloss,et al.  Assessing and Improving Methods Used in Operational Taxonomic Unit-Based Approaches for 16S rRNA Gene Sequence Analysis , 2011, Applied and Environmental Microbiology.

[31]  B. Haas,et al.  Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. , 2011, Genome research.

[32]  Jizhong Zhou,et al.  Reproducibility and quantitation of amplicon sequencing-based detection , 2011, The ISME Journal.

[33]  Jonathan A. Eisen,et al.  PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data , 2011, PLoS Comput. Biol..

[34]  N. Fierer,et al.  Microbial community resemblance methods differ in their ability to detect biologically relevant patterns , 2010, Nature Methods.

[35]  Wolfgang Maier,et al.  Current state and perspectives of fungal DNA barcoding and rapid identification procedures , 2010, Applied Microbiology and Biotechnology.

[36]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.

[37]  Qiong Wang,et al.  Bacterial Communities in the Rhizosphere of Biofuel Crops Grown on Marginal Lands as Evaluated by 16S rRNA Gene Pyrosequences , 2010, BioEnergy Research.

[38]  P. Schloss A High-Throughput DNA Sequence Aligner for Microbial Ecology Studies , 2009, PloS one.

[39]  Rob Knight,et al.  PyNAST: a flexible tool for aligning sequences to a template alignment , 2009, Bioinform..

[40]  David J Van Horn,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[41]  Sean R Eddy,et al.  A new generation of homology search tools based on probabilistic inference. , 2009, Genome informatics. International Conference on Genome Informatics.

[42]  James R. Cole,et al.  The Ribosomal Database Project: improved alignments and new tools for rRNA analysis , 2008, Nucleic Acids Res..

[43]  Susan M. Huse,et al.  Exploring Microbial Diversity and Taxonomy Using SSU rRNA Hypervariable Tag Sequencing , 2008, PLoS genetics.

[44]  Elon Portugaly,et al.  Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space , 2008, ISMB.

[45]  G. Olsen,et al.  Critical Evaluation of Two Primers Commonly Used for Amplification of Bacterial 16S rRNA Genes , 2008, Applied and Environmental Microbiology.

[46]  J. Olson,et al.  Detection of Actinobacteria cultivated from environmental samples reveals bias in universal primers , 2007, Letters in applied microbiology.

[47]  F. Bushman,et al.  Short pyrosequencing reads suffice for accurate microbial community analysis , 2007, Nucleic acids research.

[48]  M. Rillig,et al.  Evaluation of LSU rRNA-gene PCR primers for analysis of arbuscular mycorrhizal fungal communities via terminal restriction fragment length polymorphism analysis. , 2007, Journal of microbiological methods.

[49]  J. Tiedje,et al.  Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy , 2007, Applied and Environmental Microbiology.

[50]  Brian E. Granger,et al.  IPython: A System for Interactive Scientific Computing , 2007, Computing in Science & Engineering.

[51]  Susan M. Huse,et al.  Microbial diversity in the deep sea and the underexplored “rare biosphere” , 2006, Proceedings of the National Academy of Sciences.

[52]  L. Brussaard,et al.  Fungal/bacterial ratios in grasslands with contrasting nitrogen management , 2006 .

[53]  Eoin L. Brodie,et al.  Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB , 2006, Applied and Environmental Microbiology.

[54]  P. Janssen Identifying the Dominant Soil Bacterial Taxa in Libraries of 16S rRNA and 16S rRNA Genes , 2006, Applied and Environmental Microbiology.

[55]  Stefan Bertilsson,et al.  Evaluation of 23S rRNA PCR Primers for Use in Phylogenetic Studies of Bacterial Diversity , 2006, Applied and Environmental Microbiology.

[56]  Wolfgang R Streit,et al.  Metagenomics--the key to the uncultured microbes. , 2004, Current opinion in microbiology.

[57]  S. Acinas,et al.  Divergence and Redundancy of 16S rRNA Sequences in Genomes with Multiple rrn Operons , 2004, Journal of bacteriology.

[58]  J. Banfield,et al.  Community structure and metabolism through reconstruction of microbial genomes from the environment , 2004, Nature.

[59]  D. Cowan,et al.  Review and re-analysis of domain-specific 16S primers. , 2003, Journal of microbiological methods.

[60]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[61]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[62]  Yves Van de Peer,et al.  Compilation of small ribosomal subunit RNA structures , 1993, Nucleic Acids Res..

[63]  N. Pace,et al.  Rapid determination of 16S ribosomal RNA sequences for phylogenetic analyses. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[64]  Haixu Tang,et al.  Comparing Bacterial Communities Inferred from 16s Rrna Gene Sequencing and Shotgun Metagenomics , 2011, Pacific Symposium on Biocomputing.

[65]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data , 2007 .

[66]  Ying Huang,et al.  Bioinformatics Applications Note Identification of Ribosomal Rna Genes in Metagenomic Fragments , 2022 .