Structure- and context-based analysis of the GxGYxYP family reveals a new putative class of Glycoside Hydrolase

BackgroundGut microbiome metagenomics has revealed many protein families and domains found largely or exclusively in that environment. Proteins containing the GxGYxYP domain are over-represented in the gut microbiota, and are found in Polysaccharide Utilization Loci in the gut symbiont Bacteroides thetaiotaomicron, suggesting their involvement in polysaccharide metabolism, but little else is known of the function of this domain.ResultsGenomic context and domain architecture analyses support a role for the GxGYxYP domain in carbohydrate metabolism. Sparse occurrences in eukaryotes are the result of lateral gene transfer. The structure of the GxGYxYP domain-containing protein encoded by the BT2193 locus reveals two structural domains, the first composed of three divergent repeats with no recognisable homology to previously solved structures, the second a more familiar seven-stranded β/α barrel. Structure-based analyses including conservation mapping localise a presumed functional site to a cleft between the two domains of BT2193. Matching to a catalytic site template from a GH9 cellulase and other analyses point to a putative catalytic triad composed of Glu272, Asp331 and Asp333.ConclusionsWe suggest that GxGYxYP-containing proteins constitute a novel glycoside hydrolase family of as yet unknown specificity.

[1]  Robert J Woods,et al.  Structure and binding analysis of Polyporus squamosus lectin in complex with the Neu5Ac{alpha}2-6Gal{beta}1-4GlcNAc human-type influenza receptor. , 2011, Glycobiology.

[2]  A. Krogh,et al.  A combined transmembrane topology and signal peptide prediction method. , 2004, Journal of molecular biology.

[3]  Janet M. Thornton,et al.  ProFunc: a server for predicting protein function from 3D structure , 2005, Nucleic Acids Res..

[4]  M. Paetzel,et al.  Unconventional serine proteases: Variations on the catalytic Ser/His/Asp triad configuration , 2008, Protein science : a publication of the Protein Society.

[5]  J. Watson,et al.  A novel main-chain anion-binding site in proteins: the nest. A particular combination of phi,psi values in successive residues gives rise to anion-binding sites that occur commonly and are found often at functionally important regions. , 2002, Journal of molecular biology.

[6]  Michael Y. Galperin,et al.  The PA14 domain, a conserved all-beta domain in bacterial toxins, enzymes, adhesins and signaling molecules. , 2004, Trends in biochemical sciences.

[7]  Tal Pupko,et al.  Structural Genomics , 2005 .

[8]  Allegra Via,et al.  Local comparison of protein structures highlights cases of convergent evolution in analogous functional sites , 2007, BMC Bioinformatics.

[9]  David S. Goodsell,et al.  The RCSB Protein Data Bank: new resources for research and education , 2012, Nucleic Acids Res..

[10]  Spencer J. Williams,et al.  Structural and mechanistic insight into N-glycan processing by endo-α-mannosidase , 2012, Proceedings of the National Academy of Sciences.

[11]  Sean R. Eddy,et al.  Hidden Markov model speed heuristic and iterative HMM search procedure , 2010, BMC Bioinformatics.

[12]  Tal Pupko,et al.  ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids , 2010, Nucleic Acids Res..

[13]  M. Nei,et al.  MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. , 2011, Molecular biology and evolution.

[14]  J. Silberg,et al.  A transposase strategy for creating libraries of circularly permuted proteins , 2012, Nucleic acids research.

[15]  V. Bryson,et al.  Evolving Genes and Proteins. , 1965, Science.

[16]  Geoffrey J. Barton,et al.  Jalview Version 2—a multiple sequence alignment editor and analysis workbench , 2009, Bioinform..

[17]  Patrice Gouet,et al.  ESPript: analysis of multiple sequence alignments in PostScript , 1999, Bioinform..

[18]  Brandi L. Cantarel,et al.  The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics , 2008, Nucleic Acids Res..

[19]  John C. Wooley,et al.  Expansion of the Protein Repertoire in Newly Explored Environments: Human Gut Microbiome Specific Protein Families , 2010, PLoS Comput. Biol..

[20]  M. S. Madhusudhan,et al.  Biological insights from topology independent comparison of protein 3D structures , 2011, Nucleic acids research.

[21]  Spencer J. Williams,et al.  Mechanistic insights into a Ca2+-dependent family of alpha-mannosidases in a human gut symbiont. , 2010, Nature chemical biology.

[22]  K Henrick,et al.  Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. , 2004, Acta crystallographica. Section D, Biological crystallography.

[23]  Sean R Eddy,et al.  A new generation of homology search tools based on probabilistic inference. , 2009, Genome informatics. International Conference on Genome Informatics.

[24]  P. Karplus,et al.  Structure and mechanism of endo/exocellulase E4 from Thermomonospora fusca , 1997, Nature Structural Biology.

[25]  J. Gordon,et al.  Mucosal glycan foraging enhances fitness and transmission of a saccharolytic human gut bacterial symbiont. , 2008, Cell host & microbe.

[26]  Robert D. Finn,et al.  InterPro in 2011: new developments in the family and domain prediction database , 2011, Nucleic acids research.

[27]  BMC Bioinformatics , 2005 .

[28]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[29]  J. Nicholson,et al.  Host-Gut Microbiota Metabolic Interactions , 2012, Science.

[30]  M. Levitt Nature of the protein universe , 2009, Proceedings of the National Academy of Sciences.

[31]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[32]  Inna Dubchak,et al.  MicrobesOnline: an integrated portal for comparative and functional genomics , 2009, Nucleic Acids Res..

[33]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[34]  A. Wipat,et al.  Newcastle University E-prints Citation for Item: a Novel Extracellular Metallopeptidase Domain Shared by Animal Host-associated Mutualistic and Pathogenic Microbes , 2022 .

[35]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[36]  A. Deacon,et al.  Distributed structure determination at the JCSG , 2011, Acta crystallographica. Section D, Biological crystallography.

[37]  Johannes Söding,et al.  The HHpred interactive server for protein homology detection and structure prediction , 2005, Nucleic Acids Res..

[38]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[39]  David F. Smith,et al.  Bivalent Carbohydrate Binding Is Required for Biological Activity of Clitocybe nebularis Lectin (CNL), the N,N′-Diacetyllactosediamine (GalNAcβ1–4GlcNAc, LacdiNAc)-specific Lectin from Basidiomycete C. nebularis* , 2012, The Journal of Biological Chemistry.

[40]  Vincent B. Chen,et al.  Correspondence e-mail: , 2000 .

[41]  G N Murshudov,et al.  Use of TLS parameters to model anisotropic displacements in macromolecular refinement. , 2001, Acta crystallographica. Section D, Biological crystallography.

[42]  D. Strack,et al.  Serine carboxypeptidase-like acyltransferases. , 2004, Phytochemistry.

[43]  H. Brumer,et al.  Structural and enzymatic characterization of a glycoside hydrolase family 31 α-xylosidase from Cellvibrio japonicus involved in xyloglucan saccharification. , 2011, The Biochemical journal.

[44]  S. Spring,et al.  Caldithrix abyssi gen. nov., sp. nov., a nitrate-reducing, thermophilic, anaerobic bacterium isolated from a Mid-Atlantic Ridge hydrothermal vent, represents a novel bacterial lineage. , 2003, International journal of systematic and evolutionary microbiology.

[45]  Kurt Wüthrich,et al.  Structural Biology and Crystallization Communications the Jcsg High-throughput Structural Biology Pipeline , 2022 .

[46]  Mary Jo Ondrechen,et al.  POOL server: machine learning application for functional site prediction in proteins , 2012, Bioinform..

[47]  Benjamin J. Raphael,et al.  The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families , 2007, PLoS biology.

[48]  B. Henrissat,et al.  The Family 6 Carbohydrate Binding Module CmCBM6-2 Contains Two Ligand-binding Sites with Distinct Specificities*[boxs] , 2004, Journal of Biological Chemistry.

[49]  P. Rougé,et al.  Crystal structure of the GalNAc/Gal-specific agglutinin from the phytopathogenic ascomycete Sclerotinia sclerotiorum reveals novel adaptation of a beta-trefoil domain. , 2010, Journal of molecular biology.

[50]  H. Edelsbrunner,et al.  Anatomy of protein pockets and cavities: Measurement of binding site geometry and implications for ligand design , 1998, Protein science : a publication of the Protein Society.

[51]  A. Schmidt,et al.  Xylan binding subsite mapping in the xylanase from Penicillium simplicissimum using xylooligosaccharides as cryo-protectant. , 1999, Biochemistry.

[52]  D. Rigden,et al.  Mining metagenomic data for novel domains: BACON, a new carbohydrate‐binding module , 2010, FEBS Letters.

[53]  L. Pauling,et al.  Evolutionary Divergence and Convergence in Proteins , 1965 .

[54]  Hiroaki Tateno,et al.  Crystal structure of the Marasmius oreades mushroom lectin in complex with a xenotransplantation epitope. , 2007, Journal of molecular biology.

[55]  A. Boraston,et al.  Carbohydrate recognition by a large sialidase toxin from Clostridium perfringens. , 2007, Biochemistry.

[56]  Mohd Firdaus Raih,et al.  SPRITE and ASSAM: web servers for side chain 3D-motif searching in protein structures , 2012, Nucleic Acids Res..

[57]  Peer Bork,et al.  Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy , 2011, Nucleic Acids Res..

[58]  D. Irwin,et al.  Processivity, Substrate Binding, and Mechanism of Cellulose Hydrolysis by Thermobifida fusca Cel9A , 2007, Applied and Environmental Microbiology.

[59]  B. Matthews Solvent content of protein crystals. , 1968, Journal of molecular biology.

[60]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.

[61]  M. Roberfroid,et al.  Functional food science and gastrointestinal physiology and function , 1998, British Journal of Nutrition.

[62]  J. Clemente,et al.  The Impact of the Gut Microbiota on Human Health: An Integrative View , 2012, Cell.

[63]  P. Shi,et al.  Diversity, abundance and characterization of ruminal cysteine phytases suggest their important role in phytate degradation. , 2011, Environmental microbiology.

[64]  I. Paulsen,et al.  Major Facilitator Superfamily , 1998, Microbiology and Molecular Biology Reviews.

[65]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[66]  Clemens Vonrhein,et al.  Exploiting structure similarity in refinement: automated NCS and target-structure restraints in BUSTER , 2012, Acta crystallographica. Section D, Biological crystallography.

[67]  H. Gilbert,et al.  Family 6 Carbohydrate Binding Modules Recognize the Non-reducing End of β-1,3-Linked Glucans by Presenting a Unique Ligand Binding Surface* , 2005, Journal of Biological Chemistry.

[68]  Robert B. Russell,et al.  Annotation in three dimensions. PINTS: Patterns in Non-homologous Tertiary Structures , 2003, Nucleic Acids Res..

[69]  C. Townsend,et al.  Intrinsic evolutionary constraints on protease structure, enzyme acylation, and the identity of the catalytic triad , 2012, Proceedings of the National Academy of Sciences.

[70]  Ying Wei,et al.  Selective prediction of interaction sites in protein structures with THEMATICS , 2007, BMC Bioinformatics.

[71]  The UniProt Consortium,et al.  Reorganizing the protein space at the Universal Protein Resource (UniProt) , 2011, Nucleic Acids Res..

[72]  G. Sheldrick A short history of SHELX. , 2008, Acta crystallographica. Section A, Foundations of crystallography.

[73]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[74]  Masafumi Hidaka,et al.  Role of a PA14 domain in determining substrate specificity of a glycoside hydrolase family 3 β-glucosidase from Kluyveromyces marxianus. , 2010, The Biochemical journal.

[75]  A. Frasch,et al.  Structural basis of sialyltransferase activity in trypanosomal sialidases. , 2000, The EMBO journal.

[76]  Justin L Sonnenburg,et al.  A refined palate: bacterial consumption of host glycans in the gut. , 2013, Glycobiology.

[77]  D. Bolam,et al.  Family 6 carbohydrate-binding modules display multiple beta1,3-linked glucan-specific binding interfaces. , 2009, FEMS microbiology letters.

[78]  N Go,et al.  Structural motif of phosphate-binding site common to various protein superfamilies: all-against-all structural comparison of protein-mononucleotide complexes. , 1999, Protein engineering.

[79]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[80]  Sarah E. Kiehna,et al.  Carbohydrate-pi interactions: what are they worth? , 2008, Journal of the American Chemical Society.

[81]  Janet M. Thornton,et al.  The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data , 2004, Nucleic Acids Res..

[82]  Peter B. McGarvey,et al.  UniRef: comprehensive and non-redundant UniProt reference clusters , 2007, Bioinform..