The CAZy Database/the Carbohydrate-Active Enzyme (CAZy) Database: Principles and Usage Guidelines

Carbohydrate-Active enZymes (CAZymes) assemble, breakdown, and modify glycans and glycoconjugates using their catalytic and binding modules (functional protein domains). The CAZy database offers since 1998 an online and continuously updated classification of CAZyme modules (Lombard et al. 2014). Each module family in the CAZy classification has been created based on experimentally characterized protein modules from the literature, and the families are populated by related module sequences from public protein sequence databases. Since no universal threshold allows the systematic classification of the various CAZyme families, CAZy annotations result from an expert combination of module modeling/calibration and human curation. CAZy annotations are made publicly available for all proteins released by GenBank (Benson et al. 2012), Swiss-Prot (Boutet et al. 2016) and the Protein Data Bank (PDB; http://www.rcsb.org; (Berman et al. 2000)). Further, functional and 3-D structural information, curated from the literature on a regular basis, constitute essential added values to the CAZy annotation. In this spirit, the display of ligand information from crystallographic complexes has been recently developed (Lombard et al. 2014). This chapter will guide the reader through the usage of CAZy to search enzyme annotations. It will also answer frequent questions such as (i) how to obtain CAZy annotations for a specific protein, a genome, or a metagenome, (ii) how to have a newly characterized family included in the CAZy classification scheme, (iii) why CAZy does not cover all protein families related to glycans/glycoconjugates, and (iv) why CAZy does not transfer functional annotation to similar sequences. Finally, we present here a recent CAZy-associated tool, namely, the Polysaccharide Utilization Loci (PUL) predictor and database in Bacteroidetes species (Terrapon et al. 2015).

[1]  G J Davies,et al.  A classification of nucleotide-diphospho-sugar glycosyltransferases based on amino acid sequence similarities. , 1997, The Biochemical journal.

[2]  Vincent Lombard,et al.  Automatic prediction of polysaccharide utilization loci in Bacteroidetes species , 2015, Bioinform..

[3]  L. Stein,et al.  JBrowse: a next-generation genome browser. , 2009, Genome research.

[4]  Edwin Pozharski,et al.  Consolidation of glycosyl hydrolase family 30: A dual domain 4/7 hydrolase family consisting of two structurally distinct groups , 2010, FEBS letters.

[5]  B. Henrissat,et al.  How do gut microbes break down dietary fiber? , 2014, Trends in biochemical sciences.

[6]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[7]  Pedro M. Coutinho,et al.  Carbohydrate-active enzymes : an integrated database approach , 1999 .

[8]  I. Xenarios,et al.  UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View. , 2016, Methods in molecular biology.

[9]  Abigail A. Salyers,et al.  Characterization of Four Outer Membrane Proteins Involved in Binding Starch to the Cell Surface ofBacteroides thetaiotaomicron , 2000, Journal of bacteriology.

[10]  B. Henrissat,et al.  Expansion of the enzymatic repertoire of the CAZy database to integrate auxiliary redox enzymes , 2013, Biotechnology for Biofuels.

[11]  Vincent Lombard,et al.  Dividing the Large Glycoside Hydrolase Family 43 into Subfamilies: a Motivation for Detailed Enzyme Characterization , 2016, Applied and Environmental Microbiology.

[12]  Pedro M. Coutinho,et al.  The carbohydrate-active enzymes database (CAZy) in 2013 , 2013, Nucleic Acids Res..

[13]  D. Bolam,et al.  Carbohydrate-binding modules: fine-tuning polysaccharide recognition. , 2004, The Biochemical journal.

[14]  Bernard Henrissat,et al.  Dividing the large glycoside hydrolase family 13 into subfamilies: towards improved functional annotations of alpha-amylase-related proteins. , 2006, Protein engineering, design & selection : PEDS.

[15]  H. Brumer,et al.  A discrete genetic locus confers xyloglucan metabolism in select human gut Bacteroidetes , 2014, Nature.

[16]  B. Henrissat,et al.  Evolution, substrate specificity and subfamily classification of glycoside hydrolase family 5 (GH5) , 2012, BMC Evolutionary Biology.

[17]  Vincent Lombard,et al.  A hierarchical classification of polysaccharide lyases for glycogenomics. , 2010, The Biochemical journal.

[18]  Bernard Henrissat,et al.  An evolving hierarchical family classification for glycosyltransferases. , 2003, Journal of molecular biology.

[19]  Claus-Wilhelm von der Lieth,et al.  pdb-care (PDB CArbohydrate REsidue check): a program to support annotation of complex carbohydrate structures in PDB files , 2004, BMC Bioinformatics.

[20]  Peter Williams,et al.  IMG: the integrated microbial genomes database and comparative analysis system , 2011, Nucleic Acids Res..