Rethinking microbial diversity analysis in the high throughput sequencing era.

The analysis of amplified and sequenced 16S rRNA genes has become the most important single approach for microbial diversity studies. The new sequencing technologies allow for sequencing thousands of reads in a single run and a cost-effective option is split into a single run across many samples. However for this type of investigation the key question that needs to be answered is how many samples can be sequenced without biasing the results due to lack of sequence representativeness? In this work we demonstrated that the level of sequencing effort used for analyzing soil microbial communities biases the results and determines the most effective type of analysis for small and large datasets. Many simulations were performed with four independent pyrosequencing-generated 16S rRNA gene libraries from different environments. The analysis performed here illustrates the lack of resolution of OTU-based approaches for datasets with low sequence coverage. This analysis should be performed with at least 90% of sequence coverage. Diversity index values increase with sample size making normalization of the number of sequences in all samples crucial. An important finding of this study was the advantage of phylogenetic approaches for examining microbial communities with low sequence coverage. However, if the environments being compared were closely related, a deeper sequencing would be necessary to detect the variation in the microbial composition.

[1]  A. Chao Nonparametric estimation of the number of classes in a population , 1984 .

[2]  Alexander Isaev,et al.  PyEvolve: a toolkit for statistical modelling of molecular evolution , 2004, BMC Bioinformatics.

[3]  Robert K. Colwell,et al.  Estimating terrestrial biodiversity through extrapolation. , 1994, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[4]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[5]  V. Kunin,et al.  Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. , 2009, Environmental microbiology.

[6]  Rob Knight,et al.  UniFrac – An online tool for comparing microbial community diversity in a phylogenetic context , 2006, BMC Bioinformatics.

[7]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[8]  R. Knight,et al.  Pyrosequencing-Based Assessment of Soil pH as a Predictor of Soil Bacterial Community Structure at the Continental Scale , 2009, Applied and Environmental Microbiology.

[9]  R. Knight,et al.  Fast UniFrac: Facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data , 2009, The ISME Journal.

[10]  G. Casella,et al.  Pyrosequencing enumerates and contrasts soil microbial diversity , 2007, The ISME Journal.

[11]  Austin G. Davis-Richardson,et al.  PANGEA: pipeline for analysis of next generation amplicons , 2010, The ISME Journal.

[12]  J. Hughes,et al.  Counting the Uncountable: Statistical Approaches to Estimating Microbial Diversity , 2001, Applied and Environmental Microbiology.

[13]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[14]  C. Bienhold,et al.  Bacterial diversity and biogeography in deep-sea surface sediments of the South Atlantic Ocean , 2010, The ISME Journal.

[15]  R. Knight,et al.  Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. , 2009, Genome research.

[16]  Robert K. Colwell,et al.  A new statistical approach for assessing similarity of species composition with incidence and abundance data , 2004 .

[17]  R. Knight,et al.  Global patterns in bacterial diversity , 2007, Proceedings of the National Academy of Sciences.

[18]  James H. Brown,et al.  Microbial biogeography: putting microorganisms on the map , 2006, Nature Reviews Microbiology.

[19]  Susan M. Huse,et al.  Microbial Population Structures in the Deep Marine Biosphere , 2007, Science.

[20]  J. Handelsman,et al.  Introducing SONS, a Tool for Operational Taxonomic Unit-Based Comparisons of Microbial Community Memberships and Structures , 2006, Applied and Environmental Microbiology.

[21]  P. Turnbaugh,et al.  Microbial ecology: Human gut microbes associated with obesity , 2006, Nature.

[22]  G. Patil,et al.  Diversity as a Concept and its Measurement , 1982 .

[23]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[24]  R. Knight,et al.  Species divergence and the measurement of microbial diversity. , 2008, FEMS microbiology reviews.

[25]  C. Deming,et al.  Topographical and Temporal Diversity of the Human Skin Microbiome , 2009, Science.

[26]  R. Knight,et al.  Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex , 2008, Nature Methods.

[27]  Christine Wiedinmyer,et al.  Characterization of Airborne Microbial Communities at a High-Elevation Site and Their Potential To Act as Atmospheric Ice Nuclei , 2009, Applied and Environmental Microbiology.

[28]  R. Knight,et al.  UniFrac: a New Phylogenetic Method for Comparing Microbial Communities , 2005, Applied and Environmental Microbiology.

[29]  E. H. Simpson Measurement of Diversity , 1949, Nature.

[30]  N. Pace,et al.  Molecular-phylogenetic characterization of microbial community imbalances in human inflammatory bowel diseases , 2007, Proceedings of the National Academy of Sciences.

[31]  Woo Jun Sul,et al.  Bacterial diversity in rhizosphere soil from Antarctic vascular plants of Admiralty Bay, maritime Antarctica , 2010, The ISME Journal.

[32]  Brian Everitt,et al.  Principles of Multivariate Analysis , 2001 .

[33]  Susan M. Huse,et al.  Microbial diversity in the deep sea and the underexplored “rare biosphere” , 2006, Proceedings of the National Academy of Sciences.

[34]  James R. Cole,et al.  The Ribosomal Database Project: improved alignments and new tools for rRNA analysis , 2008, Nucleic Acids Res..

[35]  Wojtek J. Krzanowski,et al.  Principles of multivariate analysis : a user's perspective. oxford , 1988 .

[36]  L. Forney,et al.  The tragedy of the uncommon: understanding limitations in the analysis of microbial diversity , 2008, The ISME Journal.