Automatic prediction of polysaccharide utilization loci in Bacteroidetes species

MOTIVATION A bacterial polysaccharide utilization locus (PUL) is a set of physically linked genes that orchestrate the breakdown of a specific glycan. PULs are prevalent in the Bacteroidetes phylum and are key to the digestion of complex carbohydrates, notably by the human gut microbiota. A given Bacteroidetes genome can encode dozens of different PULs whose boundaries and precise gene content are difficult to predict. RESULTS Here, we present a fully automated approach for PUL prediction using genomic context and domain annotation alone. By combining the detection of a pair of marker genes with operon prediction using intergenic distances, and queries to the carbohydrate-active enzymes database (www.cazy.org), our predictor achieved above 86% accuracy in two Bacteroides species with extensive experimental PUL characterization. AVAILABILITY AND IMPLEMENTATION PUL predictions in 67 Bacteroidetes genomes from the human gut microbiota and two additional species, from the canine oral sphere and from the environment, are presented in our database accessible at www.cazy.org/PULDB/index.php.

[1]  J. Gordon,et al.  Starch catabolism by a prominent human gut symbiont is directed by the recognition of amylose helices. , 2008, Structure.

[2]  B. Snel,et al.  Conservation of gene order: a fingerprint of proteins that physically interact. , 1998, Trends in biochemical sciences.

[3]  H. Flint,et al.  Microbial degradation of complex carbohydrates in the gut , 2012, Gut microbes.

[4]  T. Smith,et al.  Multidomain Carbohydrate-binding Proteins Involved in Bacteroides thetaiotaomicron Starch Metabolism* , 2012, The Journal of Biological Chemistry.

[5]  S. Salzberg,et al.  Prediction of operons in microbial genomes. , 2001, Nucleic acids research.

[6]  I-Min A. Chen,et al.  IMG/M: the integrated metagenome data management and comparative analysis system , 2011, Nucleic Acids Res..

[7]  T. Smith,et al.  SusG: a unique cell-membrane-associated alpha-amylase from a prominent human gut symbiont targets complex starch molecules. , 2010, Structure.

[8]  Anton J. Enright,et al.  Protein interaction maps for complete genomes based on gene fusion events , 1999, Nature.

[9]  N. Mcneil The contribution of the large intestine to energy supplies in man. , 1984, The American journal of clinical nutrition.

[10]  B. Henrissat,et al.  Novel Features of the Polysaccharide-Digesting Gliding Bacterium Flavobacterium johnsoniae as Revealed by Genome Sequence Analysis , 2009, Applied and Environmental Microbiology.

[11]  Ying Xu,et al.  DOOR: a database for prokaryotic operons , 2008, Nucleic Acids Res..

[12]  Roger A. Laine,et al.  Invited Commentary: A calculation of all possible oligosaccharide isomers both branched and linear yields 1.05 × 1012 structures for a reducing hexasaccharide: the Isomer Barrier to development of single-method saccharide sequencing or synthesis systems , 1994 .

[13]  J. Gordon,et al.  Coordinate Regulation of Glycan Degradation and Polysaccharide Capsule Biosynthesis by a Prominent Human Gut Symbiont , 2009, The Journal of Biological Chemistry.

[14]  Bernard Henrissat,et al.  The abundance and variety of carbohydrate-active enzymes in the human gut microbiota , 2013, Nature Reviews Microbiology.

[15]  Erich Bornberg-Bauer,et al.  DoMosaics: software for domain arrangement visualization and domain-centric analysis of proteins , 2014, Bioinform..

[16]  J. Gordon,et al.  Mucosal glycan foraging enhances fitness and transmission of a saccharolytic human gut bacterial symbiont. , 2008, Cell host & microbe.

[17]  Robert Buels,et al.  JBrowse: A Next-Generation Genome Browser , 2014 .

[18]  Olivier Gascuel,et al.  Detection of new protein domains using co-occurrence: application to Plasmodium falciparum , 2009, Bioinform..

[19]  Pedro M. Coutinho,et al.  The carbohydrate-active enzymes database (CAZy) in 2013 , 2013, Nucleic Acids Res..

[20]  D. Eisenberg,et al.  Detecting protein function and protein-protein interactions from genome sequences. , 1999, Science.

[21]  B. Henrissat,et al.  How do gut microbes break down dietary fiber? , 2014, Trends in biochemical sciences.

[22]  H. Brumer,et al.  A discrete genetic locus confers xyloglucan metabolism in select human gut Bacteroidetes , 2014, Nature.

[23]  P. Turnbaugh,et al.  Microbial ecology: Human gut microbes associated with obesity , 2006, Nature.

[24]  Bernard Henrissat,et al.  Effects of Diet on Resource Utilization by a Model Human Gut Microbiota Containing Bacteroides cellulosilyticus WH2, a Symbiont with an Extensive Glycobiome , 2013, PLoS biology.

[25]  R. Koebnik TonB-dependent trans-envelope signalling: the exception or the rule? , 2005, Trends in microbiology.

[26]  Julio Collado-Vides,et al.  A powerful non-homology method for the prediction of operons in prokaryotes , 2002, ISMB.

[27]  S. Eddy,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[28]  Bernard Henrissat,et al.  Recognition and Degradation of Plant Cell Wall Polysaccharides by Two Human Gut Symbionts , 2011, PLoS biology.

[29]  Dmitry A Rodionov,et al.  New Substrates for Tonb-dependent Transport: Do We Only See the 'tip of the Iceberg'? , 2022 .

[30]  Temple F. Smith,et al.  Operons in Escherichia coli: genomic analyses and predictions. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[31]  S. Teichmann,et al.  Domain combinations in archaeal, eubacterial and eukaryotic proteomes. , 2001, Journal of molecular biology.

[32]  M. Gerstein,et al.  Annotation transfer for genomics: measuring functional divergence in multi-domain proteins. , 2001, Genome research.

[33]  Jeremy Buhler,et al.  Operon prediction without a training set , 2005, Bioinform..

[34]  Enrique Merino,et al.  ProOpDB: Prokaryotic Operon DataBase , 2011, Nucleic Acids Res..

[35]  G. Cornelis,et al.  The genome and surface proteome of Capnocytophaga canimorsus reveal a key role of glycan foraging systems in host glycoproteins deglycosylation , 2011, Molecular microbiology.

[36]  Yu Qiu,et al.  Predicting bacterial transcription units using sequence and expression data , 2003, ISMB.

[37]  I. Tanaka,et al.  Structural and Functional Analysis of a Glycoside Hydrolase Family 97 Enzyme from Bacteroides thetaiotaomicron* , 2008, Journal of Biological Chemistry.

[38]  Adam Godzik,et al.  Polysaccharides utilization in human gut bacterium Bacteroides thetaiotaomicron: comparative genomics reconstruction of metabolic and regulatory networks , 2013, BMC Genomics.