Structural genomics analysis of uncharacterized protein families overrepresented in human gut bacteria identifies a novel glycoside hydrolase

BackgroundBacteroides spp. form a significant part of our gut microbiome and are well known for optimized metabolism of diverse polysaccharides. Initial analysis of the archetypal Bacteroides thetaiotaomicron genome identified 172 glycosyl hydrolases and a large number of uncharacterized proteins associated with polysaccharide metabolism.ResultsBT_1012 from Bacteroides thetaiotaomicron VPI-5482 is a protein of unknown function and a member of a large protein family consisting entirely of uncharacterized proteins. Initial sequence analysis predicted that this protein has two domains, one on the N- and one on the C-terminal. A PSI-BLAST search found over 150 full length and over 90 half size homologs consisting only of the N-terminal domain. The experimentally determined three-dimensional structure of the BT_1012 protein confirms its two-domain architecture and structural analysis of both domains suggests their specific functions. The N-terminal domain is a putative catalytic domain with significant similarity to known glycoside hydrolases, the C-terminal domain has a beta-sandwich fold typically found in C-terminal domains of other glycosyl hydrolases, however these domains are typically involved in substrate binding. We describe the structure of the BT_1012 protein and discuss its sequence-structure relationship and their possible functional implications.ConclusionsStructural and sequence analyses of the BT_1012 protein identifies it as a glycosyl hydrolase, expanding an already impressive catalog of enzymes involved in polysaccharide metabolism in Bacteroides spp. Based on this we have renamed the Pfam families representing the two domains found in the BT_1012 protein, PF13204 and PF12904, as putative glycoside hydrolase and glycoside hydrolase-associated C-terminal domain respectively.

[1]  F A Quiocho,et al.  Carbohydrate-binding proteins: tertiary structures and protein-sugar interactions. , 1986, Annual review of biochemistry.

[2]  Kurt Wüthrich,et al.  Structural Biology and Crystallization Communications the Jcsg High-throughput Structural Biology Pipeline , 2022 .

[3]  M. Weiss,et al.  Two non‐proline cis peptide bonds may be important for factor XIII function , 1998, FEBS letters.

[4]  Inna Dubchak,et al.  MicrobesOnline: an integrated portal for comparative and functional genomics , 2009, Nucleic Acids Res..

[5]  Vincent B. Chen,et al.  Correspondence e-mail: , 2000 .

[6]  Olivier Gascuel,et al.  Fast and Accurate Phylogeny Reconstruction Algorithms Based on the Minimum-Evolution Principle , 2002, WABI.

[7]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[8]  D. Bolam,et al.  Carbohydrate-binding modules: fine-tuning polysaccharide recognition. , 2004, The Biochemical journal.

[9]  The UniProt Consortium,et al.  Reorganizing the protein space at the Universal Protein Resource (UniProt) , 2011, Nucleic Acids Res..

[10]  G. Sheldrick A short history of SHELX. , 2008, Acta crystallographica. Section A, Foundations of crystallography.

[11]  Martin Vingron,et al.  TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing , 2002, Bioinform..

[12]  S. Dusko Ehrlich [Metagenomics of the intestinal microbiota: potential applications]. , 2010, Gastroenterologie clinique et biologique.

[13]  K. Katoh,et al.  MAFFT version 5: improvement in accuracy of multiple sequence alignment , 2005, Nucleic acids research.

[14]  C. Sander,et al.  Dali: a network tool for protein structure comparison. , 1995, Trends in biochemical sciences.

[15]  Xin Chen,et al.  dbCAN: a web resource for automated carbohydrate-active enzyme annotation , 2012, Nucleic Acids Res..

[16]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[17]  M. P. Cummings PHYLIP (Phylogeny Inference Package) , 2004 .

[18]  Brandi L. Cantarel,et al.  The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics , 2008, Nucleic Acids Res..

[19]  A. Godzik,et al.  Exploration of Uncharted Regions of the Protein Universe , 2009, PLoS biology.

[20]  Gert Vriend,et al.  Everyday , 2020, Oxford Research Encyclopedia of Literature.

[21]  M. Weiss,et al.  On the use of the merging R factor as a quality indicator for X-ray data , 1997 .

[22]  S. Ehrlich Metagenomics of the intestinal microbiota: potential applications , 2010 .

[23]  Patrice Gouet,et al.  ESPript: analysis of multiple sequence alignments in PostScript , 1999, Bioinform..

[24]  B Henrissat,et al.  Structural and sequence-based classification of glycoside hydrolases. , 1997, Current opinion in structural biology.

[25]  M. Murakami,et al.  The accessory domain changes the accessibility and molecular topography of the catalytic interface in monomeric GH39 β-xylosidases. , 2012, Acta crystallographica. Section D, Biological crystallography.

[26]  Collaborative Computational,et al.  The CCP4 suite: programs for protein crystallography. , 1994, Acta crystallographica. Section D, Biological crystallography.

[27]  Owen Johnson,et al.  iMOSFLM: a new graphical interface for diffraction-image processing with MOSFLM , 2011, Acta crystallographica. Section D, Biological crystallography.

[28]  N. Hakulinen,et al.  Structural analysis, enzymatic characterization, and catalytic mechanisms of β‐galactosidase from Bacillus circulans sp. alkalophilus , 2012, The FEBS journal.

[29]  Kevin Cowtan,et al.  research papers Acta Crystallographica Section D Biological , 2005 .

[30]  Leszek Rychlewski,et al.  FFAS03: a server for profile–profile sequence alignments , 2005, Nucleic Acids Res..

[31]  Koushik Mazumder,et al.  Structure and Function of an Arabinoxylan-specific Xylanase* , 2011, The Journal of Biological Chemistry.

[32]  Serge X. Cohen,et al.  Automated macromolecular model building for X-ray crystallography using ARP/wARP version 7 , 2008, Nature Protocols.

[33]  Y. Shoham,et al.  Microbial hemicellulases. , 2003, Current opinion in microbiology.

[34]  J. Silberg,et al.  A transposase strategy for creating libraries of circularly permuted proteins , 2012, Nucleic acids research.

[35]  Robert D. Finn,et al.  Structural Biology and Crystallization Communications Dufs: Families in Search of Function , 2022 .

[36]  Kristiina Takkinen,et al.  Acta Crystallogr. Sect. F Struct. Biol. Cryst. Commun. , 2008 .

[37]  Christian M. Zmasek,et al.  phyloXML: XML for evolutionary biology and comparative genomics , 2009, BMC Bioinformatics.

[38]  D. Cruickshank,et al.  Remarks about protein structure precision. , 1999, Acta crystallographica. Section D, Biological crystallography.

[39]  J. Doré,et al.  Functional metagenomics to mine the human gut microbiome for dietary fiber catabolic enzymes. , 2010, Genome research.

[40]  Martyn D Winn,et al.  Macromolecular TLS refinement in REFMAC at moderate resolutions. , 2003, Methods in enzymology.

[41]  後藤 正夫 Fundamentals of bacterial plant pathology , 1992 .

[42]  Adam Godzik,et al.  Multiple flexible structure alignment using partial order graphs , 2005, Bioinform..

[43]  P. Rosenstiel Stories of love and hate: innate immunity and host–microbe crosstalk in the intestine , 2013, Current opinion in gastroenterology.

[44]  P. Andrew Karplus,et al.  Improved R-factors for diffraction data analysis in macromolecular crystallography , 1997, Nature Structural Biology.

[45]  Peter Kuhn,et al.  Blu-Ice and the Distributed Control System: software for data acquisition and instrument control at macromolecular crystallography beamlines. , 2002, Journal of synchrotron radiation.

[46]  Eric Blanc,et al.  Automated structure solution with autoSHARP. , 2007, Methods in molecular biology.

[47]  Lynn K. Carmichael,et al.  A Genomic View of the Human-Bacteroides thetaiotaomicron Symbiosis , 2003, Science.