Expansion of the Protein Repertoire in Newly Explored Environments: Human Gut Microbiome Specific Protein Families

The microbes that inhabit particular environments must be able to perform molecular functions that provide them with a competitive advantage to thrive in those environments. As most molecular functions are performed by proteins and are conserved between related proteins, we can expect that organisms successful in a given environmental niche would contain protein families that are specific for functions that are important in that environment. For instance, the human gut is rich in polysaccharides from the diet or secreted by the host, and is dominated by Bacteroides, whose genomes contain highly expanded repertoire of protein families involved in carbohydrate metabolism. To identify other protein families that are specific to this environment, we investigated the distribution of protein families in the currently available human gut genomic and metagenomic data. Using an automated procedure, we identified a group of protein families strongly overrepresented in the human gut. These not only include many families described previously but also, interestingly, a large group of previously unrecognized protein families, which suggests that we still have much to discover about this environment. The identification and analysis of these families could provide us with new information about an environment critical to our health and well being.

[1]  David A. Lee,et al.  PSI-2: structural genomics to cover protein domain family space. , 2009, Structure.

[2]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[3]  D. Savage Microbial ecology of the gastrointestinal tract. , 1977, Annual review of microbiology.

[4]  Lynn K. Carmichael,et al.  A Genomic View of the Human-Bacteroides thetaiotaomicron Symbiosis , 2003, Science.

[5]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[6]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[7]  Adam Godzik,et al.  Shotgun metaproteomics of the human distal gut microbiota , 2008, The ISME Journal.

[8]  M. Pop,et al.  Metagenomic Analysis of the Human Distal Gut Microbiome , 2006, Science.

[9]  Abigail A. Salyers,et al.  Characterization of Four Outer Membrane Proteins Involved in Binding Starch to the Cell Surface ofBacteroides thetaiotaomicron , 2000, Journal of bacteriology.

[10]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[11]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[12]  Lisa C. Crossman,et al.  Extensive DNA Inversions in the B. fragilis Genome Control Variable Gene Expression , 2005, Science.

[13]  D. Fischer,et al.  Analysis of singleton ORFans in fully sequenced microbial genomes , 2003, Proteins.

[14]  R. Knight,et al.  The Human Microbiome Project , 2007, Nature.

[15]  A. Godzik,et al.  Probing Metagenomics by Rapid Cluster Analysis of Very Large Datasets , 2008, PloS one.

[16]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[17]  Eric C. Martens,et al.  Complex Glycan Catabolism by the Human Gut Microbiota: The Bacteroidetes Sus-like Paradigm , 2009, The Journal of Biological Chemistry.

[18]  T. Takagi,et al.  MetaGene: prokaryotic gene finding from environmental genome shotgun sequences , 2006, Nucleic acids research.

[19]  E. Purdom,et al.  Diversity of the Human Intestinal Microbial Flora , 2005, Science.

[20]  Eoin L. Brodie,et al.  Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB , 2006, Applied and Environmental Microbiology.

[21]  S F Altschul,et al.  Generalized affine gap costs for protein sequence alignment , 1998, Proteins.

[22]  Benjamin J. Raphael,et al.  The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families , 2007, PLoS biology.

[23]  Hiroshi Mori,et al.  Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut Microbiomes , 2007, DNA research : an international journal for rapid publication of reports on genes and genomes.