SweetOrigins: Extracting Evolutionary Information from Glycans

Glycans, the most diverse biopolymer and crucial for many biological processes, are shaped by evolutionary pressures stemming in particular from host-pathogen interactions. While this positions glycans as being essential for understanding and targeting host-pathogen interactions, their considerable diversity and a lack of methods has hitherto stymied progress in leveraging their predictive potential. Here, we utilize a curated dataset of 12,674 glycans from 1,726 species to develop and apply machine learning methods to extract evolutionary information from glycans. Our deep learning-based language model SweetOrigins provides evolution-informed glycan representations that we utilize to discover and investigate motifs used for molecular mimicry-mediated immune evasion by commensals and pathogens. Novel glycan alignment methods enable us to identify and contextualize virulence-determining motifs in the capsular polysaccharide of Staphylococcus aureus and Acinetobacter baumannii. Further, we show that glycan-based phylogenetic trees contain most of the information present in traditional 16S rRNA-based phylogenies and improve on the differentiation of genetically closely related but phenotypically divergent species, such as Bacillus cereus and Bacillus anthracis. Leveraging the evolutionary information inherent in glycans with machine learning methodology is poised to provide further – critically needed – insights into host-pathogen interactions, sequence-to-function relationships, and the major influence of glycans on phenotypic plasticity.

[1]  Ethan C. Alley,et al.  Low-N protein engineering with data-efficient deep learning , 2020, Nature Methods.

[2]  Daniel Bojar,et al.  Using Natural Language Processing to Learn the Grammar of Glycans , 2020, bioRxiv.

[3]  Timothy K. Lu,et al.  Sequence-to-function deep learning frameworks for synthetic biology , 2019, bioRxiv.

[4]  P. Bork,et al.  Interactive Tree Of Life (iTOL) v4: recent updates and new developments , 2019, Nucleic Acids Res..

[5]  C. Freire-de-Lima,et al.  Theft and Reception of Host Cell's Sialic Acid: Dynamics of Trypanosoma Cruzi Trans-sialidases and Mucin-Like Molecules on Chagas' Disease Immunomodulation , 2019, Front. Immunol..

[6]  T. Silhavy,et al.  Cyclic Enterobacterial Common Antigen Maintains the Outer Membrane Permeability Barrier of Escherichia coli in a Manner Controlled by YhdP , 2018, mBio.

[7]  T. Stehle,et al.  Biophysical analysis of sialic acid recognition by the complement regulator Factor H , 2018, Glycobiology.

[8]  Emmanuel Paradis,et al.  ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R , 2018, Bioinform..

[9]  Luis Perez,et al.  The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[10]  Pelin Yilmaz,et al.  25 years of serving the community with ribosomal RNA gene reference databases and tools. , 2017, Journal of biotechnology.

[11]  C. Lebrilla,et al.  Enterocyte glycosylation is responsive to changes in extracellular conditions: implications for membrane functions , 2017, Glycobiology.

[12]  T. K. van den Berg,et al.  Decoding the Human Immunoglobulin G-Glycan Repertoire Reveals a Spectrum of Fc-Receptor- and Complement-Mediated-Effector Activities , 2017, Front. Immunol..

[13]  M. Morbidelli,et al.  Influence of protein/glycan interaction on site‐specific glycan heterogeneity , 2017, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[14]  Ajit Varki,et al.  Biological roles of glycans , 2016, Glycobiology.

[15]  Jonathan Lombard The multiple evolutionary origins of the eukaryotic N-glycosylation pathway , 2016, Biology Direct.

[16]  C. Huttenhower,et al.  The healthy human microbiome , 2016, Genome Medicine.

[17]  J. Tiralongo,et al.  Glycan:glycan interactions: High affinity biomolecular interactions that can mediate binding of pathogenic bacteria to host cells , 2015, Proceedings of the National Academy of Sciences.

[18]  J. Collins,et al.  Synthetic biology devices for in vitro and in vivo diagnostics , 2015, Proceedings of the National Academy of Sciences.

[19]  Philip V. Toukach,et al.  Carbohydrate structure database merged from bacterial, archaeal, plant and fungal parts , 2015, Nucleic Acids Res..

[20]  R. Isberg,et al.  Antibiotic Modulation of Capsular Exopolysaccharide and Virulence in Acinetobacter baumannii , 2015, PLoS pathogens.

[21]  K. Schleifer,et al.  Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences , 2014, Nature Reviews Microbiology.

[22]  G. Lauc,et al.  Glycans – the third revolution in evolution , 2014, Front. Genet..

[23]  Pedro M. Coutinho,et al.  The carbohydrate-active enzymes database (CAZy) in 2013 , 2013, Nucleic Acids Res..

[24]  Kiyoko F. Aoki-Kinoshita,et al.  UniCarbKB: building a knowledge platform for glycoproteomics , 2013, Nucleic Acids Res..

[25]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[26]  Jianjun Li,et al.  Biosynthesis of the Polymannose Lipopolysaccharide O-antigens from Escherichia coli Serotypes O8 and O9a Requires a Unique Combination of Single- and Multiple-active Site Mannosyltransferases* , 2012, The Journal of Biological Chemistry.

[27]  M. Soloski,et al.  Enterobacterial Common Antigen Mutants of Salmonella enterica Serovar Typhimurium Establish a Persistent Infection and Provide Protection against Subsequent Lethal Challenge , 2011, Infection and Immunity.

[28]  D. Hawksworth,et al.  An assessment of fungal wall heteromannans as a phylogenetically informative character in ascomycetes. , 2010, FEMS microbiology reviews.

[29]  Xavier Glorot,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[30]  A. Dell,et al.  Fut2-null mice display an altered glycosylation profile and impaired BabA-mediated Helicobacter pylori adhesion to gastric mucosa. , 2009, Glycobiology.

[31]  V. Nizet,et al.  Innovations in host and microbial sialic acid biosynthesis revealed by phylogenomic prediction of nonulosonic acid structure , 2009, Proceedings of the National Academy of Sciences.

[32]  V. Nizet,et al.  Molecular mimicry of host sialylated glycans allows a bacterial pathogen to engage neutrophil Siglec-9 and dampen the innate immune response. , 2009, Blood.

[33]  Kai Griebenow,et al.  Effects of glycosylation on the stability of protein pharmaceuticals. , 2009, Journal of pharmaceutical sciences.

[34]  G J Davies,et al.  Glycosyltransferases: structures, functions, and mechanisms. , 2008, Annual review of biochemistry.

[35]  R. Munson,et al.  The enterobacterial common antigen-like gene cluster of Haemophilus ducreyi contributes to virulence in humans. , 2008, The Journal of infectious diseases.

[36]  Wonyong Kim,et al.  Glycosyltransferase: a specific marker for the discrimination of Bacillus anthracis from the Bacillus cereus group. , 2008, Journal of medical microbiology.

[37]  T. Nakagawa,et al.  Biological function of fucosylation in cancer biology. , 2007, Journal of biochemistry.

[38]  Diane E. Taylor,et al.  Fucosylation in prokaryotes and eukaryotes. , 2006, Glycobiology.

[39]  A. Varki Nothing in Glycobiology Makes Sense, except in the Light of Evolution , 2006, Cell.

[40]  J. Lowe,et al.  Role of glycosylation in development. , 2003, Annual review of biochemistry.

[41]  Julia Y. Wang,et al.  Structural rationale for the modulation of abscess formation by Staphylococcus aureus capsular polysaccharides , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[42]  T. O. Cróinín,et al.  Molecular mimicry of ferret gastric epithelial blood group antigen A by Helicobacter mustelae. , 1998, Gastroenterology.

[43]  K. B. Kiser,et al.  Staphylococcus aureus cap5O andcap5P Genes Functionally Complement Mutations Affecting Enterobacterial Common-Antigen Biosynthesis inEscherichia coli , 1998, Journal of bacteriology.

[44]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[45]  C. Weidenmaier,et al.  Structure and Function of Surface Polysaccharides of Staphylococcus aureus. , 2017, Current topics in microbiology and immunology.

[46]  Michael A. Smith,et al.  These include: , 1993 .