An approach for N-linked glycan identification from MS/MS spectra by target-decoy strategy

Glycan structure determination serves as an essential step for the thorough investigation of the structure and function of protein. Currently, appropriate sample preparation followed by tandem mass spectrometry has emerged as the dominant technique for the characterization of glycans and glycopeptides. Although extensive efforts have been made to the development of computational approaches for the automated interpretation of glycopeptide spectra, the previously appeared methods lack a reasonable quality control strategy for the statistical validation of reported results. In this manuscript, we introduced a novel method that constructed a decoy glycan database based on the glycan structures in the target database, and searched the experimental spectra against both the target and decoy databases to find the best matched glycans. Specifically, a two-layer scoring scheme for calculating a normalized matching score is applied in the search procedure which enables the unbiased ranking of the matched glycans. Experimental analysis showed that our proposed method can report more structures with high confidence compared with previous approaches.

[1]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[2]  Bin Ma,et al.  Complexities and Algorithms for Glycan Sequencing Using Tandem Mass Spectrometry , 2008, J. Bioinform. Comput. Biol..

[3]  J. Marth,et al.  Glycosylation in Cellular Mechanisms of Health and Disease , 2006, Cell.

[4]  Feng Li,et al.  Glycobioinformatics: Current strategies and tools for data mining in MS‐based glycoproteomics , 2013, Proteomics.

[5]  David J Harvey,et al.  Proteomic analysis of glycosylation: structural determination of N- and O-linked glycans by mass spectrometry , 2005, Expert review of proteomics.

[6]  Andreas Bohne,et al.  SWEET-DB: an attempt to create annotated data collections for carbohydrates , 2002, Nucleic Acids Res..

[7]  K Bock,et al.  The Complex Carbohydrate Structure Database. , 1989, Trends in biochemical sciences.

[8]  Chen-Chun Chen,et al.  MAGIC: an automated N-linked glycoprotein identification tool using a Y1-ion pattern matching algorithm and in silico MS² approach. , 2015, Analytical chemistry.

[9]  Niclas G Karlsson,et al.  Development of a mass fingerprinting tool for automated interpretation of oligosaccharide fragmentation data , 2004, Proteomics.

[10]  B. Ma,et al.  GlycoMaster DB: software to assist the automated identification of N-linked glycopeptides by tandem mass spectrometry. , 2014, Journal of proteome research.

[11]  MengChu Zhou,et al.  An Accurate de novo Algorithm for Glycan Topology Determination from Mass Spectra , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Bin Ma,et al.  An Improved Approach for N-Linked Glycan Structure Identification from HCD MS/MS Spectra , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  T. Hennet Diseases of glycosylation beyond classical congenital disorders of glycosylation. , 2012, Biochimica et biophysica acta.

[14]  Alessio Ceroni,et al.  GlycoWorkbench: a tool for the computer-assisted annotation of mass spectra of glycans. , 2008, Journal of proteome research.

[15]  F. Young Biochemistry , 1955, The Indian Medical Gazette.

[16]  A. Dell,et al.  Glycoprotein Structure Determination by Mass Spectrometry , 2001, Science.

[17]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[18]  S L Morrison,et al.  Effect of glycosylation on antibody function: implications for genetic engineering. , 1997, Trends in biotechnology.

[19]  B. Ma,et al.  An Effective Approach for Glycan Structure De Novo Sequencing From HCD Spectra , 2016, IEEE Transactions on NanoBioscience.

[20]  H. Perreault,et al.  Application of the StrOligo algorithm for the automated structure assignment of complex N-linked glycans from glycoproteins using tandem mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[21]  Ming Li,et al.  PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[22]  Kiyoko F. Aoki-Kinoshita,et al.  GlyTouCan 1.0 – The international glycan structure repository , 2015, Nucleic Acids Res..

[23]  Cécile Boscher,et al.  Glycosylation, galectins and cellular signaling. , 2011, Current opinion in cell biology.

[24]  N. Lumen,et al.  Glycosylation of prostate specific antigen and its potential diagnostic applications. , 2012, Clinica chimica acta; international journal of clinical chemistry.

[25]  J. Leary,et al.  STAT: a saccharide topology analysis tool used in combination with tandem mass spectrometry. , 2000, Analytical chemistry.

[26]  S. Mohammed,et al.  Improved peptide identification by targeted fragmentation using CID, HCD and ETD on an LTQ-Orbitrap Velos. , 2011, Journal of proteome research.

[27]  Martin Frank,et al.  GlycomeDB—a unified database for carbohydrate structures , 2010, Nucleic Acids Res..

[28]  Ruedi Aebersold,et al.  Mass Spectrometry Based Glycoproteomics—From a Proteomics Perspective* , 2010, Molecular & Cellular Proteomics.

[29]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[30]  Chad W. Johnston,et al.  Dereplicating nonribosomal peptides using an informatic search algorithm for natural products (iSNAP) discovery , 2012, Proceedings of the National Academy of Sciences.

[31]  Haixu Tang,et al.  Computational framework for identification of intact glycopeptides in complex samples. , 2014, Analytical chemistry.

[32]  A. Burlingame,et al.  Characterization of protein glycosylation by mass spectrometry. , 1996, Current opinion in biotechnology.

[33]  Tatsuya Akutsu,et al.  Efficient tree-matching methods for accurate carbohydrate database queries. , 2003, Genome informatics. International Conference on Genome Informatics.

[34]  Bin Ma,et al.  A Novel Algorithm for Glycan de novo Sequencing Using Tandem Mass Spectrometry , 2015, ISBRA.

[35]  W. Alley,et al.  Characterization of glycopeptides by combining collision-induced dissociation and electron-transfer dissociation mass spectrometry data. , 2009, Rapid communications in mass spectrometry : RCM.

[36]  S. Batra,et al.  Pathobiological implications of mucin glycans in cancer: Sweet poison and novel targets. , 2015, Biochimica et biophysica acta.

[37]  Ylva Gavel,et al.  Sequence differences between glycosylated and non-glycosylated Asn-X-Thr/Ser acceptor sites: implications for protein engineering , 1990, Protein engineering.

[38]  Yasubumi Sakakibara,et al.  A Machine Learning Based Approach to de novo Sequencing of Glycans from Tandem Mass Spectrometry Spectrum , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[39]  Nicolle H. Packer,et al.  GlycoSuiteDB: a new curated relational database of glycoprotein glycan structures and their biological sources , 2001, Nucleic Acids Res..

[40]  Hiren J. Joshi,et al.  GlycoSuiteDB: a curated relational database of glycoprotein glycan structures and their biological sources. 2003, update , 2003, Nucleic Acids Res..

[41]  Martin Frank,et al.  GLYCOSCIENCES.de: an Internet portal to support glycomics and glycobiology research. , 2006, Glycobiology.

[42]  Tatsuya Akutsu,et al.  KCaM (KEGG Carbohydrate Matcher): a software tool for analyzing the structures of carbohydrate sugar chains , 2004, Nucleic Acids Res..

[43]  Claus-Wilhelm von der Lieth,et al.  GlycoFragment and GlycoSearchMS: web tools to support the interpretation of mass spectra of complex carbohydrates , 2004, Nucleic Acids Res..

[44]  R. Dwek,et al.  Glycoproteins: glycan presentation and protein-fold stability. , 1999, Structure.

[45]  Haixu Tang,et al.  Automated interpretation of MS/MS spectra of oligosaccharides , 2005, ISMB.

[46]  Florian Rasche,et al.  Determination of Glycan Structure from Tandem Mass Spectra , 2011, TCBB.

[47]  Kaizhong Zhang,et al.  A constrained edit distance between unordered labeled trees , 1996, Algorithmica.

[48]  André M Deelder,et al.  Glycoproteomics based on tandem mass spectrometry of glycopeptides. , 2007, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[49]  Radoslav Goldman,et al.  Exploring site-specific N-glycosylation microheterogeneity of haptoglobin using glycopeptide CID tandem mass spectra and glycan database search. , 2013, Journal of proteome research.

[50]  Martin Frank,et al.  EUROCarbDB: An open-access platform for glycoinformatics , 2010, Glycobiology.