ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences

Recent metagenomics studies of environmental samples suggested that microbial communities are much more diverse than previously reported, and deep sequencing will significantly increase the estimate of total species diversity. Massively parallel pyrosequencing technology enables ultra-deep sequencing of complex microbial populations rapidly and inexpensively. However, computational methods for analyzing large collections of 16S ribosomal sequences are limited. We proposed a new algorithm, referred to as ESPRIT, which addresses several computational issues with prior methods. We developed two versions of ESPRIT, one for personal computers (PCs) and one for computer clusters (CCs). The PC version is used for small- and medium-scale data sets and can process several tens of thousands of sequences within a few minutes, while the CC version is for large-scale problems and is able to analyze several hundreds of thousands of reads within one day. Large-scale experiments are presented that clearly demonstrate the effectiveness of the newly proposed algorithm. The source code and user guide are freely available at http://www.biotech.ufl.edu/people/sun/esprit.html.

[1]  S. Hurlbert The Nonconcept of Species Diversity: A Critique and Alternative Parameters. , 1971, Ecology.

[2]  David G. Stork,et al.  Pattern Classification , 1973 .

[3]  A. Chao Nonparametric estimation of the number of classes in a population , 1984 .

[4]  A. Chao,et al.  Estimating the Number of Classes via Sample Coverage , 1992 .

[5]  A. Chao,et al.  Stopping rules and estimation for recapture debugging with unequal failure rates , 1993 .

[6]  S. Karlin,et al.  Dinucleotide relative abundance extremes: a genomic signature. , 1995, Trends in genetics : TIG.

[7]  S Karlin,et al.  Compositional biases of bacterial genomes and evolutionary implications , 1997, Journal of bacteriology.

[8]  J. Borneman,et al.  Molecular microbial diversity in soils from eastern Amazonia: evidence for unusual microorganisms and microbial population shifts associated with deforestation , 1997, Applied and environmental microbiology.

[9]  Martin F. Polz,et al.  Bias in Template-to-Product Ratios in Multitemplate PCR , 1998, Applied and Environmental Microbiology.

[10]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[11]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[12]  P. Hugenholtz,et al.  Cultivation of globally distributed soil bacteria from phylogenetic lineages previously only detected in cultivation-independent surveys. , 2002, Environmental microbiology.

[13]  Robert C. Edgar,et al.  Local homology recognition and distance measures in linear time using compressed amino acid alphabets. , 2004, Nucleic acids research.

[14]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[15]  K. Katoh,et al.  MAFFT version 5: improvement in accuracy of multiple sequence alignment , 2005, Nucleic acids research.

[16]  J. Handelsman,et al.  Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness , 2005, Applied and Environmental Microbiology.

[17]  Forest Rohwer,et al.  FastGroupII: A web-based bioinformatics platform for analyses of large 16S rDNA libraries , 2006, BMC Bioinformatics.

[18]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[19]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[20]  Philip Hugenholtz,et al.  NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes , 2006, Nucleic Acids Res..

[21]  Devdatt P. Dubhashi,et al.  Bayesian classifiers for detecting HGT using fixed and variable order markov models of genomic signatures , 2006, Bioinform..

[22]  Jo Handelsman,et al.  Toward a Census of Bacteria in Soil , 2006, PLoS Comput. Biol..

[23]  Susan M. Huse,et al.  Microbial diversity in the deep sea and the underexplored “rare biosphere” , 2006, Proceedings of the National Academy of Sciences.

[24]  Jonathan A Eisen,et al.  Environmental Shotgun Sequencing: Its Potential and Challenges for Studying the Hidden World of Microbes , 2007, PLoS biology.

[25]  Susan M. Huse,et al.  Microbial Population Structures in the Deep Marine Biosphere , 2007, Science.

[26]  Susan M. Huse,et al.  Accuracy and quality of massively parallel DNA pyrosequencing , 2007, Genome Biology.

[27]  Li Liu,et al.  Estimating Microbial Population Densities Based on Genomic Signatures , 2007, International Conference on Bioinformatics & Computational Biology.

[28]  A. Salamov,et al.  Use of simulated data sets to evaluate the fidelity of metagenomic processing methods , 2007, Nature Methods.

[29]  G. Casella,et al.  Pyrosequencing enumerates and contrasts soil microbial diversity , 2007, The ISME Journal.

[30]  Susan M. Huse,et al.  Pyrosequencing analysis of the Oral Microflora of healthy adults , 2008, Journal of dental research.

[31]  J. Rothberg,et al.  The development and impact of 454 sequencing , 2008, Nature Biotechnology.

[32]  James R. Cole,et al.  The Ribosomal Database Project: improved alignments and new tools for rRNA analysis , 2008, Nucleic Acids Res..

[33]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[34]  P. Sassone-Corsi,et al.  Computational Improvements Reveal Great Bacterial Diversity and High Metal Toxicity in Soil , 2022 .

[35]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .