dbCAN2: a meta server for automated carbohydrate-active enzyme annotation

Abstract Complex carbohydrates of plants are the main food sources of animals and microbes, and serve as promising renewable feedstock for biofuel and biomaterial production. Carbohydrate active enzymes (CAZymes) are the most important enzymes for complex carbohydrate metabolism. With an increasing number of plant and plant-associated microbial genomes and metagenomes being sequenced, there is an urgent need of automatic tools for genomic data mining of CAZymes. We developed the dbCAN web server in 2012 to provide a public service for automated CAZyme annotation for newly sequenced genomes. Here, dbCAN2 (http://cys.bios.niu.edu/dbCAN2) is presented as an updated meta server, which integrates three state-of-the-art tools for CAZome (all CAZymes of a genome) annotation: (i) HMMER search against the dbCAN HMM (hidden Markov model) database; (ii) DIAMOND search against the CAZy pre-annotated CAZyme sequence database and (iii) Hotpep search against the conserved CAZyme short peptide database. Combining the three outputs and removing CAZymes found by only one tool can significantly improve the CAZome annotation accuracy. In addition, dbCAN2 now also accepts nucleotide sequence submission, and offers the service to predict physically linked CAZyme gene clusters (CGCs), which will be a very useful online tool for identifying putative polysaccharide utilization loci (PULs) in microbial genomes or metagenomes.

[1]  Zhenglu Yang,et al.  dbCAN-seq: a database of carbohydrate-active enzyme (CAZyme) sequence and annotation , 2017, Nucleic Acids Res..

[2]  Miriam L. Land,et al.  Trace: Tennessee Research and Creative Exchange Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification Recommended Citation Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification , 2022 .

[3]  Xin Chen,et al.  dbCAN: a web resource for automated carbohydrate-active enzyme annotation , 2012, Nucleic Acids Res..

[4]  Vincent Lombard,et al.  PULDB: the expanded database of Polysaccharide Utilization Loci , 2017, Nucleic Acids Res..

[5]  D. Cockburn,et al.  Polysaccharide Degradation by the Intestinal Microbiota and Its Influence on Human Health and Disease. , 2016, Journal of molecular biology.

[6]  Yanbin Yin,et al.  PlantCAZyme: a database for plant carbohydrate-active enzymes , 2014, Database J. Biol. Databases Curation.

[7]  Haixu Tang,et al.  FragGeneScan: predicting genes in short and error-prone reads , 2010, Nucleic acids research.

[8]  G Douglas Inglis,et al.  SACCHARIS: an automated pipeline to streamline discovery of carbohydrate active enzyme activities within polyspecific families and de novo sequence datasets , 2018, Biotechnology for Biofuels.

[9]  E. Uberbacher,et al.  CAZymes Analysis Toolkit (CAT): web service for searching and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy database. , 2010, Glycobiology.

[10]  J. Gordon,et al.  Functional Genomic and Metabolic Studies of the Adaptations of a Prominent Adult Human Gut Symbiont, Bacteroides thetaiotaomicron, to the Suckling Period* , 2006, Journal of Biological Chemistry.

[11]  Chee Sian Kuan,et al.  DemaDb: an integrated dematiaceous fungal genomes database , 2016, Database J. Biol. Databases Curation.

[12]  Janina A. Krumbeck,et al.  Prebiotics and synbiotics: dietary strategies for improving gut health , 2016, Current opinion in gastroenterology.

[13]  Pedro M. Coutinho,et al.  The carbohydrate-active enzymes database (CAZy) in 2013 , 2013, Nucleic Acids Res..

[14]  Peer Bork,et al.  MOCAT2: a metagenomic assembly, annotation and profiling framework , 2016, Bioinform..

[15]  Peer Bork,et al.  proGenomes: a resource for consistent functional and taxonomic annotations of prokaryotic genomes , 2016, Nucleic Acids Res..

[16]  Brandi L. Cantarel,et al.  The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics , 2008, Nucleic Acids Res..

[17]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[18]  Bernard Henrissat,et al.  The abundance and variety of carbohydrate-active enzymes in the human gut microbiota , 2013, Nature Reviews Microbiology.

[19]  Peter K. Busk,et al.  Homology to peptide pattern for annotation of carbohydrate-active enzymes and prediction of function , 2017, BMC Bioinformatics.

[20]  Robert D. Finn,et al.  HMMER web server: interactive sequence similarity searching , 2011, Nucleic Acids Res..

[21]  S. Brunak,et al.  SignalP 4.0: discriminating signal peptides from transmembrane regions , 2011, Nature Methods.

[22]  B. Henrissat,et al.  Corrigendum: Glycan complexity dictates microbial resource allocation in the large intestine , 2016, Nature Communications.

[23]  H. Brumer,et al.  Polysaccharide Utilization Loci: Fueling Microbial Communities , 2017, Journal of bacteriology.

[24]  Lene Lange,et al.  Function-Based Classification of Carbohydrate-Active Enzymes by Recognition of Short, Conserved Peptide Motifs , 2013, Applied and Environmental Microbiology.

[25]  H. Flint,et al.  Microbial degradation of complex carbohydrates in the gut , 2012, Gut microbes.

[26]  Kiyoko F. Aoki-Kinoshita A Practical Guide to Using Glycomics Databases , 2017 .