Incorporating 16S Gene Copy Number Information Improves Estimates of Microbial Diversity and Abundance

The abundance of different SSU rRNA (“16S”) gene sequences in environmental samples is widely used in studies of microbial ecology as a measure of microbial community structure and diversity. However, the genomic copy number of the 16S gene varies greatly – from one in many species to up to 15 in some bacteria and to hundreds in some microbial eukaryotes. As a result of this variation the relative abundance of 16S genes in environmental samples can be attributed both to variation in the relative abundance of different organisms, and to variation in genomic 16S copy number among those organisms. Despite this fact, many studies assume that the abundance of 16S gene sequences is a surrogate measure of the relative abundance of the organisms containing those sequences. Here we present a method that uses data on sequences and genomic copy number of 16S genes along with phylogenetic placement and ancestral state estimation to estimate organismal abundances from environmental DNA sequence data. We use theory and simulations to demonstrate that 16S genomic copy number can be accurately estimated from the short reads typically obtained from high-throughput environmental sequencing of the 16S gene, and that organismal abundances in microbial communities are more strongly correlated with estimated abundances obtained from our method than with gene abundances. We re-analyze several published empirical data sets and demonstrate that the use of gene abundance versus estimated organismal abundance can lead to different inferences about community diversity and structure and the identity of the dominant taxa in microbial communities. Our approach will allow microbial ecologists to make more accurate inferences about microbial diversity and abundance based on 16S sequence data.

[1]  Rob Knight,et al.  UniFrac – An online tool for comparing microbial community diversity in a phylogenetic context , 2006, BMC Bioinformatics.

[2]  E. Stackebrandt,et al.  Effect of genome size and rrn gene copy number on PCR amplification of 16S rRNA genes from a mixture of bacterial species , 1995, Applied and environmental microbiology.

[3]  Eoin L. Brodie,et al.  Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB , 2006, Applied and Environmental Microbiology.

[4]  S. Giovannoni,et al.  Bias caused by template annealing in the amplification of mixtures of 16S rRNA genes by PCR , 1996, Applied and environmental microbiology.

[5]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[6]  J. Hughes,et al.  New approaches to analyzing microbial biodiversity data. , 2003, Current opinion in microbiology.

[7]  Thomas P. Curtis,et al.  Estimating prokaryotic diversity and its limits , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Rob Knight,et al.  PyNAST: a flexible tool for aligning sequences to a template alignment , 2009, Bioinform..

[9]  F. Bushman,et al.  Short pyrosequencing reads suffice for accurate microbial community analysis , 2007, Nucleic acids research.

[10]  W. Liesack,et al.  Phylogenetic identity, growth-response time and rRNA operon copy number of soil bacteria indicate different stages of community succession. , 2007, Environmental microbiology.

[11]  T. Garland,et al.  TESTING FOR PHYLOGENETIC SIGNAL IN COMPARATIVE DATA: BEHAVIORAL TRAITS ARE MORE LABILE , 2003, Evolution; international journal of organic evolution.

[12]  Natalia N. Ivanova,et al.  A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea , 2009, Nature.

[13]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[14]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[15]  B. Bohannan,et al.  Microbial Biogeography: From Taxonomy to Traits , 2008, Science.

[16]  B. Haas,et al.  Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. , 2011, Genome research.

[17]  J. Eisen,et al.  A simple, fast, and accurate method of phylogenomic inference , 2008, Genome Biology.

[18]  T. Garland,et al.  Tempo and mode in evolution: phylogenetic inertia, adaptation and comparative methods , 2002 .

[19]  James R. Cole,et al.  The Ribosomal Database Project: improved alignments and new tools for rRNA analysis , 2008, Nucleic Acids Res..

[20]  Campbell O. Webb,et al.  Picante: R tools for integrating phylogenies and ecology , 2010, Bioinform..

[21]  V. Kunin,et al.  Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. , 2009, Environmental microbiology.

[22]  Rajat Rastogi,et al.  Visualization of ribosomal RNA operon copy number distribution , 2009, BMC Microbiology.

[23]  J. Felsenstein Phylogenies and the Comparative Method , 1985, The American Naturalist.

[24]  Susan M. Huse,et al.  Microbial diversity in the deep sea and the underexplored “rare biosphere” , 2006, Proceedings of the National Academy of Sciences.

[25]  Jeremy R. deWaard,et al.  Biological identifications through DNA barcodes , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[26]  T. Schmidt,et al.  rRNA Operon Copy Number Reflects Ecological Strategies of Bacteria , 2000, Applied and Environmental Microbiology.

[27]  F. W. Preston The Commonness, And Rarity, of Species , 1948 .

[28]  Soo-Je Park,et al.  Comparative analysis of archaeal 16S rRNA and amoA genes to estimate the abundance and diversity of ammonia-oxidizing archaea in marine sediments , 2008, Extremophiles.

[29]  J. Handelsman Metagenomics: Application of Genomics to Uncultured Microorganisms , 2004, Microbiology and Molecular Biology Reviews.

[30]  E. Delong,et al.  Community Genomics Among Stratified Microbial Assemblages in the Ocean's Interior , 2006, Science.

[31]  O. White,et al.  Environmental Genome Shotgun Sequencing of the Sargasso Sea , 2004, Science.

[32]  C. Criddle,et al.  Understanding bias in microbial community analysis techniques due to rrn operon copy number heterogeneity. , 2003, BioTechniques.

[33]  N. Pace A molecular view of microbial diversity and the biosphere. , 1997, Science.

[34]  Yan Boucher,et al.  Use of 16S rRNA and rpoB Genes as Molecular Markers for Microbial Ecology Studies , 2006, Applied and Environmental Microbiology.

[35]  S. Acinas,et al.  Divergence and Redundancy of 16S rRNA Sequences in Genomes with Multiple rrn Operons , 2004, Journal of bacteriology.

[36]  Sallie W. Chisholm,et al.  Unlocking Short Read Sequencing for Metagenomics , 2010, PloS one.

[37]  Thomas M. Schmidt,et al.  rrnDB: documenting the number of rRNA and tRNA genes in bacteria and archaea , 2008, Nucleic Acids Res..

[38]  R. Knight,et al.  Bacterial Community Variation in Human Body Habitats Across Space and Time , 2009, Science.

[39]  J. Bunge,et al.  Polymerase chain reaction primers miss half of rRNA microbial diversity , 2009, The ISME Journal.

[40]  Anthony R. Ives,et al.  Using the Past to Predict the Present: Confidence Intervals for Regression Equations in Phylogenetic Comparative Methods , 2000, The American Naturalist.

[41]  L. Forney,et al.  The tragedy of the uncommon: understanding limitations in the analysis of microbial diversity , 2008, The ISME Journal.

[42]  E. Virginia Armbrust,et al.  pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree , 2010, BMC Bioinformatics.

[43]  G. B. Fogel,et al.  Prokaryotic Genome Size and SSU rDNA Copy Number: Estimation of Microbial Relative Abundance from a Mixed Population , 1999, Microbial Ecology.

[44]  Peter F. Hallin,et al.  RNAmmer: consistent and rapid annotation of ribosomal RNA genes , 2007, Nucleic acids research.

[45]  C. Woese,et al.  Phylogenetic structure of the prokaryotic domain: The primary kingdoms , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[46]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.

[47]  Philip Hugenholtz,et al.  Impact of Culture-Independent Studies on the Emerging Phylogenetic View of Bacterial Diversity , 1998, Journal of bacteriology.

[48]  T. Garland,et al.  Procedures for the Analysis of Comparative Data Using Phylogenetically Independent Contrasts , 1992 .

[49]  M. Bulmer On Fitting the Poisson Lognormal Distribution to Species-Abundance Data , 1974 .