Microbial Community Analysis with Ribosomal Gene Fragments from Shotgun Metagenomes

ABSTRACT Shotgun metagenomic sequencing does not depend on gene-targeted primers or PCR amplification; thus, it is not affected by primer bias or chimeras. However, searching rRNA genes from large shotgun Illumina data sets is computationally expensive, and no approach exists for unsupervised community analysis of small-subunit (SSU) rRNA gene fragments retrieved from shotgun data. We present a pipeline, SSUsearch, to achieve the faster identification of short-subunit rRNA gene fragments and enabled unsupervised community analysis with shotgun data. It also includes classification and copy number correction, and the output can be used by traditional amplicon analysis platforms. Shotgun metagenome data using this pipeline yielded higher diversity estimates than amplicon data but retained the grouping of samples in ordination analyses. We applied this pipeline to soil samples with paired shotgun and amplicon data and confirmed bias against Verrucomicrobia in a commonly used V6-V8 primer set, as well as discovering likely bias against Actinobacteria and for Verrucomicrobia in a commonly used V4 primer set. This pipeline can utilize all variable regions in SSU rRNA and also can be applied to large-subunit (LSU) rRNA genes for confirmation of community structure. The pipeline can scale to handle large amounts of soil metagenomic data (5 Gb memory and 5 central processing unit hours to process 38 Gb [1 lane] of trimmed Illumina HiSeq2500 data) and is freely available at https://github.com/dib-lab/SSUsearch under a BSD license.

[1]  N. Pace,et al.  Rapid determination of 16S ribosomal RNA sequences for phylogenetic analyses. , 1985, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Yves Van de Peer,et al.  Compilation of small ribosomal subunit RNA structures , 1993, Nucleic Acids Res..

[3]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[4]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[5]  D. Cowan,et al.  Review and re-analysis of domain-specific 16S primers. , 2003, Journal of microbiological methods.

[6]  Wolfgang R Streit,et al.  Metagenomics--the key to the uncultured microbes. , 2004, Current opinion in microbiology.

[7]  J. Banfield,et al.  Community structure and metabolism through reconstruction of microbial genomes from the environment , 2004, Nature.

[8]  S. Acinas,et al.  Divergence and Redundancy of 16S rRNA Sequences in Genomes with Multiple rrn Operons , 2004, Journal of bacteriology.

[9]  Stefan Bertilsson,et al.  Evaluation of 23S rRNA PCR Primers for Use in Phylogenetic Studies of Bacterial Diversity , 2006, Applied and Environmental Microbiology.

[10]  L. Brussaard,et al.  Fungal/bacterial ratios in grasslands with contrasting nitrogen management , 2006 .

[11]  Eoin L. Brodie,et al.  Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB , 2006, Applied and Environmental Microbiology.

[12]  Susan M. Huse,et al.  Microbial diversity in the deep sea and the underexplored “rare biosphere” , 2006, Proceedings of the National Academy of Sciences.

[13]  P. Janssen Identifying the Dominant Soil Bacterial Taxa in Libraries of 16S rRNA and 16S rRNA Genes , 2006, Applied and Environmental Microbiology.

[14]  F. Bushman,et al.  Short pyrosequencing reads suffice for accurate microbial community analysis , 2007, Nucleic acids research.

[15]  J. Olson,et al.  Detection of Actinobacteria cultivated from environmental samples reveals bias in universal primers , 2007, Letters in applied microbiology.

[16]  M. Rillig,et al.  Evaluation of LSU rRNA-gene PCR primers for analysis of arbuscular mycorrhizal fungal communities via terminal restriction fragment length polymorphism analysis. , 2007, Journal of microbiological methods.

[17]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[18]  J. Tiedje,et al.  Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy , 2007, Applied and Environmental Microbiology.

[19]  Brian E. Granger,et al.  IPython: A System for Interactive Scientific Computing , 2007, Computing in Science & Engineering.

[20]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[21]  Susan M. Huse,et al.  Exploring Microbial Diversity and Taxonomy Using SSU rRNA Hypervariable Tag Sequencing , 2008, PLoS genetics.

[22]  G. Olsen,et al.  Critical Evaluation of Two Primers Commonly Used for Amplification of Bacterial 16S rRNA Genes , 2008, Applied and Environmental Microbiology.

[23]  Elon Portugaly,et al.  Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space , 2008, ISMB.

[24]  Ying Huang,et al.  Bioinformatics Applications Note Identification of Ribosomal Rna Genes in Metagenomic Fragments , 2022 .

[25]  Sean R Eddy,et al.  A new generation of homology search tools based on probabilistic inference. , 2009, Genome informatics. International Conference on Genome Informatics.

[26]  P. Schloss A High-Throughput DNA Sequence Aligner for Microbial Ecology Studies , 2009, PloS one.

[27]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[28]  James R. Cole,et al.  The Ribosomal Database Project: improved alignments and new tools for rRNA analysis , 2008, Nucleic Acids Res..

[29]  Qiong Wang,et al.  Bacterial Communities in the Rhizosphere of Biofuel Crops Grown on Marginal Lands as Evaluated by 16S rRNA Gene Pyrosequences , 2010, BioEnergy Research.

[30]  Wolfgang Maier,et al.  Current state and perspectives of fungal DNA barcoding and rapid identification procedures , 2010, Applied Microbiology and Biotechnology.

[31]  Rob Knight,et al.  PyNAST: a flexible tool for aligning sequences to a template alignment , 2009, Bioinform..

[32]  R. Knight,et al.  Microbial community resemblance methods differ in their ability to detect biologically relevant patterns , 2010, Nature Methods.

[33]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.

[34]  Patrick D. Schloss,et al.  Assessing and Improving Methods Used in Operational Taxonomic Unit-Based Approaches for 16S rRNA Gene Sequence Analysis , 2011, Applied and Environmental Microbiology.

[35]  Haixu Tang,et al.  Comparing Bacterial Communities Inferred from 16s Rrna Gene Sequencing and Shotgun Metagenomics , 2011, Pacific Symposium on Biocomputing.

[36]  M. Hartmann,et al.  Metaxa: a software tool for automated detection and discrimination among ribosomal small subunit (12S/16S/18S) sequences of archaea, bacteria, eukaryotes, mitochondria, and chloroplasts in metagenomes and environmental sequencing datasets , 2011, Antonie van Leeuwenhoek.

[37]  Dan-Ping Mao,et al.  Coverage evaluation of universal bacterial primers using the metagenomic datasets , 2012, BMC Microbiology.

[38]  Jizhong Zhou,et al.  Reproducibility and quantitation of amplicon sequencing-based detection , 2011, The ISME Journal.

[39]  Kuan-Liang Liu,et al.  Accurate, Rapid Taxonomic Classification of Fungal Large-Subunit rRNA Genes , 2011, Applied and Environmental Microbiology.

[40]  Rob Knight,et al.  Using QIIME to Analyze 16S rRNA Gene Sequences from Microbial Communities , 2011, Current protocols in bioinformatics.

[41]  B. Haas,et al.  Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. , 2011, Genome research.

[42]  Scott T. Bates,et al.  The under-recognized dominance of Verrucomicrobia in soil bacterial communities. , 2011, Soil biology & biochemistry.

[43]  Jonathan A. Eisen,et al.  PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data , 2011, PLoS Comput. Biol..

[44]  Jae-Hak Lee,et al.  rRNASelector: A computer program for selecting ribosomal RNA encoding sequences from metagenomic and metatranscriptomic shotgun libraries , 2011, The Journal of Microbiology.

[45]  Steven Salzberg,et al.  BIOINFORMATICS ORIGINAL PAPER , 2004 .

[46]  G. Bronner,et al.  Comparison of 16S rRNA and protein-coding genes as molecular markers for assessing microbial diversity (Bacteria and Archaea) in ecosystems. , 2011, FEMS microbiology ecology.

[47]  William A. Walters,et al.  Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms , 2012, The ISME Journal.

[48]  Robert A. Edwards,et al.  Identification and removal of ribosomal RNA sequences from metatranscriptomes , 2011, Bioinform..

[49]  Robert C. Edgar,et al.  Defining the core Arabidopsis thaliana root microbiome , 2012, Nature.

[50]  Teresita M. Porter,et al.  Factors That Affect Large Subunit Ribosomal DNA Amplicon Sequencing Studies of Fungal Communities: Classification Method, Primer Choice, and Error , 2012, PloS one.

[51]  A. Klindworth,et al.  Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies , 2012, Nucleic acids research.

[52]  C. Kuske,et al.  From Genus to Phylum: Large-Subunit and Internal Transcribed Spacer rRNA Operon Regions Show Similar Classification Accuracies Influenced by Database Composition , 2013, Applied and Environmental Microbiology.

[53]  C. Quince,et al.  Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities. , 2013, Environmental microbiology.

[54]  C. Hawkes,et al.  Differences in fungal and bacterial physiology alter soil carbon and nitrogen cycling: insights from meta-analysis and theoretical models. , 2013, Ecology letters.

[55]  Alexandros Stamatakis,et al.  Metagenomic species profiling using universal phylogenetic marker genes , 2013, Nature Methods.

[56]  Pelin Yilmaz,et al.  The SILVA ribosomal RNA gene database project: improved data processing and web-based tools , 2012, Nucleic Acids Res..

[57]  Kessy Abarenkov,et al.  Fungal community analysis by high-throughput sequencing of amplified markers – a user's guide , 2013, The New phytologist.

[58]  Tong Zhang,et al.  Taxonomic Precision of Different Hypervariable Regions of 16S rRNA Gene and Annotation Methods for Functional Bacterial Groups in Biological Wastewater Treatment , 2013, PloS one.

[59]  Jizhong Zhou,et al.  Soil Microbial Community Responses to a Decade of Warming as Revealed by Comparative Metagenomics , 2013, Applied and Environmental Microbiology.

[60]  Florent E. Angly,et al.  CopyRighter: a rapid tool for improving the accuracy of microbial community profiles through lineage-specific gene copy number correction , 2014, Microbiome.

[61]  Francisco M. Cornejo-Castillo,et al.  Metagenomic 16S rDNA Illumina tags are a powerful alternative to amplicon sequencing to explore diversity and structure of microbial communities. , 2014, Environmental microbiology.

[62]  Holly M. Bik,et al.  PhyloSift: phylogenetic analysis of genomes and metagenomes , 2014, PeerJ.

[63]  James R. Cole,et al.  Ribosomal Database Project: data and tools for high throughput rRNA analysis , 2013, Nucleic Acids Res..

[64]  S. Tringe,et al.  High-Throughput Metagenomic Technologies for Complex Microbial Community Analysis: Open and Closed Formats , 2015, mBio.

[65]  J. Fuhrman,et al.  Every base matters: assessing small subunit rRNA primers for marine microbiomes with mock communities, time series and global field samples. , 2016, Environmental microbiology.

[66]  Henry C. M. Leung,et al.  PhyloSift: phylogenetic analysis of genomes and metagenomes , 2014, PeerJ.

[67]  J. Tiedje,et al.  Influence of corn, switchgrass, and prairie cropping systems on soil microbial communities in the upper Midwest of the United States , 2016 .