New Glycoproteomics Software, GlycoPep Evaluator, Generates Decoy Glycopeptides de Novo and Enables Accurate False Discovery Rate Analysis for Small Data Sets

Glycoproteins are biologically significant large molecules that participate in numerous cellular activities. In order to obtain site-specific protein glycosylation information, intact glycopeptides, with the glycan attached to the peptide sequence, are characterized by tandem mass spectrometry (MS/MS) methods such as collision-induced dissociation (CID) and electron transfer dissociation (ETD). While several emerging automated tools are developed, no consensus is present in the field about the best way to determine the reliability of the tools and/or provide the false discovery rate (FDR). A common approach to calculate FDRs for glycopeptide analysis, adopted from the target-decoy strategy in proteomics, employs a decoy database that is created based on the target protein sequence database. Nonetheless, this approach is not optimal in measuring the confidence of N-linked glycopeptide matches, because the glycopeptide data set is considerably smaller compared to that of peptides, and the requirement of a consensus sequence for N-glycosylation further limits the number of possible decoy glycopeptides tested in a database search. To address the need to accurately determine FDRs for automated glycopeptide assignments, we developed GlycoPep Evaluator (GPE), a tool that helps to measure FDRs in identifying glycopeptides without using a decoy database. GPE generates decoy glycopeptides de novo for every target glycopeptide, in a 1:20 target-to-decoy ratio. The decoys, along with target glycopeptides, are scored against the ETD data, from which FDRs can be calculated accurately based on the number of decoy matches and the ratio of the number of targets to decoys, for small data sets. GPE is freely accessible for download and can work with any search engine that interprets ETD data of N-linked glycopeptides. The software is provided at https://desairegroup.ku.edu/research.

[1]  R. Dwek,et al.  Glycobiology , 2018, Biochimie.

[2]  R. Dwek,et al.  Concepts and principles of O-linked glycosylation. , 1998, Critical reviews in biochemistry and molecular biology.

[3]  R Apweiler,et al.  On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. , 1999, Biochimica et biophysica acta.

[4]  R. Campbell,et al.  Modeling human congenital disorder of glycosylation type IIa in the mouse: conservation of asparagine-linked glycan-dependent functions in mammalian physiology and insights into disease pathogenesis. , 2001, Glycobiology.

[5]  Catherine A. Cooper,et al.  GlycoMod – A software tool for determining glycosylation compositions from mass spectrometric data , 2001, Proteomics.

[6]  K. Tamura,et al.  Metabolic engineering of plant alkaloid biosynthesis. Proc Natl Acad Sci U S A , 2001 .

[7]  Edward L Huttlin,et al.  Prediction of error associated with false-positive rate determination for peptide identification in large-scale proteomics experiments using a combined reverse and forward peptide sequence database strategy. , 2007, Journal of proteome research.

[8]  David Goldberg,et al.  Automated N-glycopeptide identification using a combination of single- and tandem-MS. , 2007, Journal of proteome research.

[9]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[10]  Naoyuki Taniguchi,et al.  Comparison of the methods for profiling glycoprotein glycans--HUPO Human Disease Glycomics/Proteome Initiative multi-institutional study. , 2007, Glycobiology.

[11]  Daniel J. Kelleher,et al.  The evolution of N-glycan-dependent endoplasmic reticulum quality control factors for glycoprotein folding and degradation , 2007, Proceedings of the National Academy of Sciences.

[12]  Alexey I Nesvizhskii,et al.  Analysis and validation of proteomic data generated by tandem mass spectrometry , 2007, Nature Methods.

[13]  E. Go,et al.  Maximizing coverage of glycosylation heterogeneity in MALDI-MS analysis of glycoproteins with up to 27 glycosylation sites. , 2008, Analytical chemistry.

[14]  William Stafford Noble,et al.  Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. , 2008, Journal of proteome research.

[15]  Hyungwon Choi,et al.  False discovery rates and related statistical concepts in mass spectrometry-based proteomics. , 2008, Journal of proteome research.

[16]  Alessio Ceroni,et al.  GlycoWorkbench: a tool for the computer-assisted annotation of mass spectra of glycans. , 2008, Journal of proteome research.

[17]  Kai A Reidegeld,et al.  An easy‐to‐use Decoy Database Builder software tool, implementing different decoy strategies for false discovery rate calculation in automated MS/MS protein identifications , 2008, Proteomics.

[18]  W. Alley,et al.  Characterization of glycopeptides by combining collision-induced dissociation and electron-transfer dissociation mass spectrometry data. , 2009, Rapid communications in mass spectrometry : RCM.

[19]  J. Buhmann,et al.  Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry* , 2009, Molecular & Cellular Proteomics.

[20]  Guanghui Wang,et al.  Decoy methods for assessing false positives and false discovery rates in shotgun proteomics. , 2009, Analytical chemistry.

[21]  H. Desaire,et al.  When can glycopeptides be assigned based solely on high-resolution mass spectrometry data? , 2009 .

[22]  E. Go,et al.  Glycosylation site-specific analysis of clade C HIV-1 envelope proteins. , 2009, Journal of proteome research.

[23]  A. Nesvizhskii A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. , 2010, Journal of proteomics.

[24]  B. Garcia,et al.  Proteomics , 2011, Journal of biomedicine & biotechnology.

[25]  Ningombam Sanjib Meitei,et al.  Bioinformatics in glycomics: glycan characterization with mass spectrometric data using SimGlycan. , 2010, Methods in molecular biology.

[26]  A. Ghaemmaghami,et al.  Glycosylation of surface Ig creates a functional bridge between human follicular lymphoma and microenvironmental lectins , 2010, Proceedings of the National Academy of Sciences.

[27]  Yan Li,et al.  Simultaneous analysis of glycosylated and sialylated prostate-specific antigen revealing differential distribution of glycosylated prostate-specific antigen isoforms in prostate cancer tissues. , 2011, Analytical chemistry.

[28]  Christodoulos A. Floudas,et al.  Proteome-wide post-translational modification statistics: frequency analysis and curation of the swiss-prot database , 2011, Scientific reports.

[29]  P. Pevzner,et al.  Target-Decoy Approach and False Discovery Rate: When Things May Go Wrong , 2011, Journal of the American Society for Mass Spectrometry.

[30]  Douglas D Banks The effect of glycosylation on the folding kinetics of erythropoietin. , 2011, Journal of molecular biology.

[31]  Y. Yamauchi,et al.  Fine tuning of cell signals by glycosylation. , 2012, Journal of biochemistry.

[32]  N. Leymarie,et al.  Effective use of mass spectrometry for glycan and glycopeptide structural analysis. , 2012, Analytical chemistry.

[33]  David Hua,et al.  GlycoPep grader: a web-based utility for assigning the composition of N-linked glycopeptides. , 2012, Analytical chemistry.

[34]  Zhikai Zhu,et al.  GlycoPep Detector: a tool for assigning mass spectrometry data of N-linked glycopeptides on the basis of their electron transfer dissociation spectra. , 2013, Analytical chemistry.

[35]  C. Lebrilla,et al.  Absolute quantitation of immunoglobulin G and its glycoforms using multiple reaction monitoring. , 2013, Analytical chemistry.

[36]  Serenus Hua,et al.  Automated assignments of N- and O-site specific glycosylation with extensive glycan heterogeneity of glycoprotein mixtures. , 2013, Analytical chemistry.

[37]  Xiaomeng Su,et al.  Characterizing O-linked glycopeptides by electron transfer dissociation: fragmentation rules and applications in data analysis. , 2013, Analytical chemistry.

[38]  H. Desaire Glycopeptide Analysis, Recent Developments and Applications* , 2013, Molecular & Cellular Proteomics.

[39]  Heather Desaire,et al.  Software for automated interpretation of mass spectrometry data from glycans and glycopeptides. , 2013, The Analyst.

[40]  Robert J Chalkley,et al.  When target-decoy false discovery rate estimations are inaccurate and how to spot instances. , 2013, Journal of proteome research.

[41]  Radoslav Goldman,et al.  Exploring site-specific N-glycosylation microheterogeneity of haptoglobin using glycopeptide CID tandem mass spectra and glycan database search. , 2013, Journal of proteome research.

[42]  Zhikai Zhu,et al.  Absolute Quantitation of Glycosylation Site Occupancy Using Isotopically Labeled Standards and LC-MS , 2014, Journal of The American Society for Mass Spectrometry.

[43]  Haixu Tang,et al.  Computational framework for identification of intact glycopeptides in complex samples. , 2014, Analytical chemistry.