SweetNET: A Bioinformatics Workflow for Glycopeptide MS/MS Spectral Analysis.

Glycoproteomics has rapidly become an independent analytical platform bridging the fields of glycomics and proteomics to address site-specific protein glycosylation and its impact in biology. Current glycopeptide characterization relies on time-consuming manual interpretations and demands high levels of personal expertise. Efficient data interpretation constitutes one of the major challenges to be overcome before true high-throughput glycopeptide analysis can be achieved. The development of new glyco-related bioinformatics tools is thus of crucial importance to fulfill this goal. Here we present SweetNET: a data-oriented bioinformatics workflow for efficient analysis of hundreds of thousands of glycopeptide MS/MS-spectra. We have analyzed MS data sets from two separate glycopeptide enrichment protocols targeting sialylated glycopeptides and chondroitin sulfate linkage region glycopeptides, respectively. Molecular networking was performed to organize the glycopeptide MS/MS data based on spectral similarities. The combination of spectral clustering, oxonium ion intensity profiles, and precursor ion m/z shift distributions provided typical signatures for the initial assignment of different N-, O- and CS-glycopeptide classes and their respective glycoforms. These signatures were further used to guide database searches leading to the identification and validation of a large number of glycopeptide variants including novel deoxyhexose (fucose) modifications in the linkage region of chondroitin sulfate proteoglycans.

[1]  Shuhei Yamada,et al.  Oversulfated Chondroitin/Dermatan Sulfates Containing GlcAβ1/IdoAα1–3GalNAc(4,6-O-disulfate) Interact with L- and P-selectin and Chemokines* , 2002, The Journal of Biological Chemistry.

[2]  Hyungwon Choi,et al.  False discovery rates and related statistical concepts in mass spectrometry-based proteomics. , 2008, Journal of proteome research.

[3]  Nuno Bandeira,et al.  MS/MS networking guided analysis of molecule and gene cluster families , 2013, Proceedings of the National Academy of Sciences.

[4]  Daniel Figeys,et al.  Large-scale characterization of intact N-glycopeptides using an automated glycoproteomic method. , 2014, Journal of proteomics.

[5]  Lennart Martens,et al.  PRIDE: The proteomics identifications database , 2005, Proteomics.

[6]  F. Maley,et al.  Characterization of glycoproteins and their associated oligosaccharides through the use of endoglycosidases. , 1989, Analytical biochemistry.

[7]  K. Campbell,et al.  Matriglycan: a novel polysaccharide that links dystroglycan to the basement membrane , 2015, Glycobiology.

[8]  J. Peter-Katalinic,et al.  Application of ion mobility tandem mass spectrometry to compositional and structural analysis of glycopeptides extracted from the urine of a patient diagnosed with Schindler disease. , 2015, Rapid communications in mass spectrometry : RCM.

[9]  Jonas Nilsson,et al.  Human Urinary Glycoproteomics; Attachment Site Specific Analysis of N- and O-Linked Glycosylations by CID and ECD* , 2011, Molecular & Cellular Proteomics.

[10]  Nuno Bandeira,et al.  Dereplication and De Novo Sequencing of Nonribosomal Peptides , 2009, Nature Methods.

[11]  P. Pevzner,et al.  Deconvolution and Database Search of Complex Tandem Mass Spectra of Intact Proteins , 2010, Molecular & Cellular Proteomics.

[12]  Daniel Kolarich,et al.  The Art of Destruction: Optimizing Collision Energies in Quadrupole-Time of Flight (Q-TOF) Instruments for Glycopeptide-Based Glycoproteomics , 2016, Journal of The American Society for Mass Spectrometry.

[13]  L. Musante,et al.  N-linked (N-) Glycoproteomics of Urimary Exosomes* , 2014, Molecular & Cellular Proteomics.

[14]  Targeting the glycoproteome , 2012, Glycoconjugate Journal.

[15]  Suh-Yuen Liang,et al.  An adaptive workflow coupled with Random Forest algorithm to identify intact N-glycopeptides detected from mass spectrometry , 2014, Bioinform..

[16]  Pavel A. Pevzner,et al.  Protein identification by spectral networks analysis , 2007, Proceedings of the National Academy of Sciences.

[17]  Roger G. Linington,et al.  Molecular networking as a dereplication strategy. , 2013, Journal of natural products.

[18]  David M. Rocke,et al.  A new computer program (GlycoX) to determine simultaneously the glycosylation sites and oligosaccharide heterogeneity of glycoproteins. , 2006, Journal of proteome research.

[19]  R. Linhardt,et al.  Characterization of human placental glycosaminoglycans and regional binding to VAR2CSA in malaria infected erythrocytes , 2014, Glycoconjugate Journal.

[20]  David Hua,et al.  GlycoPep grader: a web-based utility for assigning the composition of N-linked glycopeptides. , 2012, Analytical chemistry.

[21]  Hui Zhang,et al.  Glycoform Analysis of Recombinant and Human Immunodeficiency Virus Envelope Protein gp120 via Higher Energy Collisional Dissociation and Spectral-Aligning Strategy , 2014, Analytical chemistry.

[22]  Nuno Bandeira,et al.  Protein identification by spectral networks analysis. , 2011, Methods in molecular biology.

[23]  R. Knight,et al.  Molecular cartography of the human skin surface in 3D , 2015, Proceedings of the National Academy of Sciences.

[24]  C. Sihlbom,et al.  Identification of Chondroitin Sulfate Linkage Region Glycopeptides Reveals Prohormones as a Novel Class of Proteoglycans* , 2014, Molecular & Cellular Proteomics.

[25]  Radoslav Goldman,et al.  Exploring site-specific N-glycosylation microheterogeneity of haptoglobin using glycopeptide CID tandem mass spectra and glycan database search. , 2013, Journal of proteome research.

[26]  P. H. Seeberger,et al.  Identification of carbohydrate anomers using ion mobility–mass spectrometry , 2015, Nature.

[27]  S. Cohen,et al.  Glycosyltransferase activity of Fringe modulates Notch–Delta interactions , 2000, Nature.

[28]  David L Tabb,et al.  Employing ProteoWizard to Convert Raw Mass Spectrometry Data , 2014, Current protocols in bioinformatics.

[29]  R Apweiler,et al.  On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. , 1999, Biochimica et biophysica acta.

[30]  C. Sihlbom,et al.  Distinctive MS/MS Fragmentation Pathways of Glycopeptide-Generated Oxonium Ions Provide Evidence of the Glycan Structure. , 2016, Chemistry.

[31]  P. Dorrestein,et al.  The spectral networks paradigm in high throughput mass spectrometry. , 2012, Molecular bioSystems.

[32]  Xiaomeng Su,et al.  Characterizing O-linked glycopeptides by electron transfer dissociation: fragmentation rules and applications in data analysis. , 2013, Analytical chemistry.

[33]  Daniel Kolarich,et al.  GlycoSpectrumScan: fishing glycopeptides from MS spectra of protease digests of human colostrum sIgA. , 2010, Journal of proteome research.

[34]  Nicolle H Packer,et al.  Advances in LC-MS/MS-based glycoproteomics: getting closer to system-wide site-specific mapping of the N- and O-glycoproteome. , 2014, Biochimica et biophysica acta.

[35]  K. Pyrć,et al.  Human Coronavirus NL63 Utilizes Heparan Sulfate Proteoglycans for Attachment to Target Cells , 2014, Journal of Virology.

[36]  William F. Martin,et al.  Automated glycopeptide analysis - review of current state and future directions , 2013, Briefings Bioinform..

[37]  G. Hart,et al.  O‐GlcNAc turns twenty: functional implications for post‐translational modification of nuclear and cytosolic proteins with a sugar , 2003, FEBS letters.

[38]  Xiaomeng Su,et al.  New Glycoproteomics Software, GlycoPep Evaluator, Generates Decoy Glycopeptides de Novo and Enables Accurate False Discovery Rate Analysis for Small Data Sets , 2014, Analytical chemistry.

[39]  Pavel A. Pevzner,et al.  Spectral Archives: Extending Spectral Libraries to Analyze both Identified and Unidentified Spectra , 2011, Nature Methods.

[40]  Xingde Li,et al.  GPQuest: A Spectral Library Matching Algorithm for Site-Specific Assignment of Tandem Mass Spectra to Intact N-glycopeptides. , 2015, Analytical chemistry.

[41]  Henry H. N. Lam Building and Searching Tandem Mass Spectral Libraries for Peptide Identification* , 2011, Molecular & Cellular Proteomics.

[42]  Zhen Zhang,et al.  MRMPlus: an open source quality control and assessment tool for SRM/MRM assay development , 2015, BMC Bioinformatics.

[43]  S. Cusack,et al.  Structure of the influenza virus haemagglutinin complexed with its receptor, sialic acid , 1988, Nature.

[44]  C. Sihlbom,et al.  Positive Mode LC-MS/MS Analysis of Chondroitin Sulfate Modified Glycopeptides Derived from Light and Heavy Chains of The Human Inter-α-Trypsin Inhibitor Complex* , 2015, Molecular & Cellular Proteomics.

[45]  S. Siddiquee,et al.  Fucosylated chondroitin sulfate diversity in sea cucumbers: a review. , 2014, Carbohydrate polymers.

[46]  J. Cho,et al.  Protein N-glycosylation, protein folding, and protein quality control , 2010, Molecules and cells.

[47]  Chen-Chun Chen,et al.  MAGIC: an automated N-linked glycoprotein identification tool using a Y1-ion pattern matching algorithm and in silico MS² approach. , 2015, Analytical chemistry.

[48]  R. Renkonen,et al.  De novo glycan structure search with the CID MS/MS spectra of native N-glycopeptides. , 2009, Glycobiology.

[49]  Yong J. Kil,et al.  Byonic: Advanced Peptide and Protein Identification Software , 2012, Current protocols in bioinformatics.

[50]  J. Couchman,et al.  Cell surface heparan sulfate proteoglycans control adhesion and invasion of breast carcinoma cells , 2015, Molecular Cancer.

[51]  Michael L. Creech,et al.  Integration of biological networks and gene expression data using Cytoscape , 2007, Nature Protocols.

[52]  Jonas Nilsson,et al.  Assignment of saccharide identities through analysis of oxonium ion fragmentation profiles in LC-MS/MS of glycopeptides. , 2014, Journal of proteome research.

[53]  M. Larsen,et al.  Automated N-glycan profiling of a mutant Trypanosoma rangeli sialidase expressed in Pichia pastoris, using tandem mass spectrometry and bioinformatics. , 2015, Glycobiology.

[54]  Robert J Chalkley,et al.  Mass Spectrometric Analysis, Automated Identification and Complete Annotation of O-Linked Glycopeptides , 2010, European journal of mass spectrometry.

[55]  B. Ma,et al.  GlycoMaster DB: software to assist the automated identification of N-linked glycopeptides by tandem mass spectrometry. , 2014, Journal of proteome research.

[56]  Nuno Bandeira,et al.  Interkingdom metabolic transformations captured by microbial imaging mass spectrometry , 2012, Proceedings of the National Academy of Sciences.

[57]  M. Mann,et al.  A large synthetic peptide and phosphopeptide reference library for mass spectrometry–based proteomics , 2013, Nature Biotechnology.

[58]  Robert Burke,et al.  ProteoWizard: open source software for rapid proteomics tools development , 2008, Bioinform..

[59]  Alexandre M J J Bonvin,et al.  Extended O-GlcNAc on HLA Class-I-Bound Peptides. , 2015, Journal of the American Chemical Society.

[60]  I. Weissman,et al.  Selectins: A family of adhesion receptors , 1991, Cell.

[61]  Anna Lechner,et al.  Molecular networking and pattern-based genome mining improves discovery of biosynthetic gene clusters and their products from Salinispora species. , 2015, Chemistry and Biology.

[62]  Céline Hernandez,et al.  Database construction and peptide identification strategies for proteogenomic studies on sequenced genomes. , 2014, Current topics in medicinal chemistry.

[63]  Sakari Joenväärä,et al.  N-glycoproteomics - an automated workflow approach. , 2008, Glycobiology.

[64]  C. Hesse,et al.  Enrichment of glycopeptides for glycan structure and attachment site identification , 2009, Nature Methods.

[65]  Hao Chi,et al.  pGlyco: a pipeline for the identification of intact N-glycopeptides by using HCD- and CID-MS/MS and MS3 , 2016, Scientific Reports.

[66]  Richard D. Smith,et al.  Clustering millions of tandem mass spectra. , 2008, Journal of proteome research.

[67]  Nuno Bandeira,et al.  Mass spectral molecular networking of living microbial colonies , 2012, Proceedings of the National Academy of Sciences.

[68]  Adnan Halim,et al.  Advances in mass spectrometry driven O-glycoproteomics. , 2015, Biochimica et biophysica acta.

[69]  Steven P Gygi,et al.  A probability-based approach for high-throughput protein phosphorylation analysis and site localization , 2006, Nature Biotechnology.

[70]  Feng Li,et al.  Glycobioinformatics: Current strategies and tools for data mining in MS‐based glycoproteomics , 2013, Proteomics.

[71]  Suh-Yuen Liang,et al.  Sweet-Heart - an integrated suite of enabling computational tools for automated MS2/MS3 sequencing and identification of glycopeptides. , 2013, Journal of proteomics.