Deriving enzymatic and taxonomic signatures of metagenomes from short read data

BackgroundWe propose a method for deriving enzymatic signatures from short read metagenomic data of unknown species. The short read data are converted to six pseudo-peptide candidates. We search for occurrences of Specific Peptides (SPs) on the latter. SPs are peptides that are indicative of enzymatic function as defined by the Enzyme Commission (EC) nomenclature. The number of SP hits on an ensemble of short reads is counted and then converted to estimates of numbers of enzymatic genes associated with different EC categories in the studied metagenome. Relative amounts of different EC categories define the enzymatic spectrum, without the need to perform genomic assemblies of short reads.ResultsThe method is developed and tested on 22 bacteria for which there exist many EC annotations in Uniprot. Enzymatic signatures are derived for 3 metagenomes, and their functional profiles are explored.We extend the SP methodology to taxon-specific SPs (TSPs), allowing us to estimate taxonomic features of metagenomic data from short reads. Using recent Swiss-Prot data we obtain TSPs for different phyla of bacteria, and different classes of proteobacteria. These allow us to analyze the major taxonomic content of 4 different metagenomic data-sets.ConclusionsThe SP methodology can be successfully extended to applications on short read genomic and metagenomic data. This leads to direct derivation of enzymatic signatures from raw short reads. Furthermore, by employing TSPs, one obtains valuable taxonomic information.

[1]  James R. Cole,et al.  The Ribosomal Database Project: improved alignments and new tools for rRNA analysis , 2008, Nucleic Acids Res..

[2]  J. Handelsman,et al.  Introducing SONS, a Tool for Operational Taxonomic Unit-Based Comparisons of Microbial Community Memberships and Structures , 2006, Applied and Environmental Microbiology.

[3]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[4]  Naryttza N. Diaz,et al.  Phylogenetic classification of short environmental DNA fragments , 2008, Nucleic acids research.

[5]  Eytan Ruppin,et al.  Unsupervised learning of natural languages , 2006 .

[6]  P. Bork,et al.  Molecular eco-systems biology: towards an understanding of community function , 2008, Nature Reviews Microbiology.

[7]  David Horn,et al.  Data mining of enzymes using specific peptides , 2009, BMC Bioinformatics.

[8]  Rick L. Stevens,et al.  Functional metagenomic profiling of nine biomes , 2008, Nature.

[9]  Eytan Ruppin,et al.  Functional Representation of Enzymes by Specific Peptides , 2007, PLoS Comput. Biol..

[10]  S. Tringe,et al.  Comparative Metagenomics of Microbial Communities , 2004, Science.

[11]  E. Delong,et al.  Community Genomics Among Stratified Microbial Assemblages in the Ocean's Interior , 2006, Science.

[12]  J. Handelsman,et al.  Metagenomics: genomic analysis of microbial communities. , 2004, Annual review of genetics.

[13]  John C. Wooley,et al.  A Primer on Metagenomics , 2010, PLoS Comput. Biol..

[14]  Pascal Lapierre,et al.  Estimating the size of the bacterial pan-genome. , 2009, Trends in genetics : TIG.

[15]  P. Bork,et al.  Protein sequence motifs. , 1996, Current opinion in structural biology.

[16]  M. Breitbart,et al.  Using pyrosequencing to shed light on deep mine microbial ecology , 2006, BMC Genomics.