SweetSEQer, Simple de Novo Filtering and Annotation of Glycoconjugate Mass Spectra

The past 15 years have seen significant progress in LC-MS/MS peptide sequencing, including the advent of successful de novo and database search methods; however, analysis of glycopeptide and, more generally, glycoconjugate spectra remains a much more open problem, and much annotation is still performed manually. This is partly because glycans, unlike peptides, need not be linear chains and are instead described by trees. In this study, we introduce SweetSEQer, an extremely simple open source tool for identifying potential glycopeptide MS/MS spectra. We evaluate SweetSEQer on manually curated glycoconjugate spectra and on negative controls, and we demonstrate high quality filtering that can be easily improved for specific applications. We also demonstrate a high overlap between peaks annotated by experts and peaks annotated by SweetSEQer, as well as demonstrate inferred glycan graphs consistent with canonical glycan tree motifs. This study presents a novel tool for annotating spectra and producing glycan graphs from LC-MS/MS spectra. The tool is evaluated and shown to perform similarly to an expert on manually curated data.

[1]  Lukas Käll,et al.  Recognizing uncertainty increases robustness and reproducibility of mass spectrometry-based protein inferences. , 2012, Journal of proteome research.

[2]  Amy-Joan L Ham,et al.  Sample preparation and digestion for proteomic analyses using spin filters , 2005, Proteomics.

[3]  Jian Min Ren,et al.  N-Glycan structure annotation of glycopeptides using a linearized glycan structure database (GlyDB). , 2007, Journal of proteome research.

[4]  Robert Burke,et al.  ProteoWizard: open source software for rapid proteomics tools development , 2008, Bioinform..

[5]  H. Desaire,et al.  When can glycopeptides be assigned based solely on high-resolution mass spectrometry data? , 2009 .

[6]  Ming Li,et al.  PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[7]  M. Aebi,et al.  Engineering N-linked protein glycosylation with diverse O antigen lipopolysaccharide structures in Escherichia coli. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[8]  William Stafford Noble,et al.  Faster SEQUEST searching for peptide identification from tandem mass spectra. , 2011, Journal of proteome research.

[9]  Michael A. Freitas,et al.  Monte carlo simulation-based algorithms for analysis of shotgun proteomic data. , 2008, Journal of proteome research.

[10]  A. Nesvizhskii A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. , 2010, Journal of proteomics.

[11]  A. Varki,et al.  Biological roles of oligosaccharides: all of the theories are correct , 1993, Glycobiology.

[12]  J. A. Taylor,et al.  Searching sequence databases via De novo peptide sequencing by tandem mass spectrometry , 2002, Molecular biotechnology.

[13]  David Hua,et al.  GlycoPep grader: a web-based utility for assigning the composition of N-linked glycopeptides. , 2012, Analytical chemistry.

[14]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[15]  Ting Chen,et al.  Speeding up tandem mass spectrometry database search: metric embeddings and fast near neighbor search , 2007, Bioinform..

[16]  William Stafford Noble,et al.  Rapid and accurate peptide identification from tandem mass spectra. , 2008, Journal of proteome research.

[17]  E. Go,et al.  Simplification of mass spectral analysis of acidic glycopeptides using GlycoPep ID. , 2007, Analytical chemistry.

[18]  P. Pevzner,et al.  PepNovo: de novo peptide sequencing via probabilistic network modeling. , 2005, Analytical chemistry.

[19]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[20]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[21]  Yi-Kuo Yu,et al.  RAId_DbS: Peptide Identification using Database Searches with Realistic Statistics , 2007, Biology Direct.

[22]  P. Højrup,et al.  VEMS 3.0: algorithms and computational tools for tandem mass spectrometry based identification of post-translational modifications in proteins. , 2005, Journal of proteome research.

[23]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[24]  D. Tabb,et al.  MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. , 2007, Journal of proteome research.

[25]  R. Aebersold,et al.  ProbID: A probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data , 2002, Proteomics.

[26]  Oliver Serang,et al.  A Non-parametric Cutout Index for Robust Evaluation of Identified Proteins* , 2013, Molecular & Cellular Proteomics.

[27]  Haixu Tang,et al.  Improving confidence in detection and characterization of protein N-glycosylation sites and microheterogeneity. , 2011, Rapid communications in mass spectrometry : RCM.

[28]  Lutgarde Arckens,et al.  Sweet Substitute: A software tool for in silico fragmentation of peptide‐linked N‐glycans , 2004, Proteomics.

[29]  A. Masselot,et al.  OLAV: Towards high‐throughput tandem mass spectrometry data identification , 2003, Proteomics.

[30]  R Apweiler,et al.  On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. , 1999, Biochimica et biophysica acta.

[31]  Daniel Kolarich,et al.  GlycoSpectrumScan: fishing glycopeptides from MS spectra of protease digests of human colostrum sIgA. , 2010, Journal of proteome research.

[32]  William F. Martin,et al.  Automated glycopeptide analysis - review of current state and future directions , 2013, Briefings Bioinform..

[33]  E. Go,et al.  GlycoPep DB: a tool for glycopeptide analysis using a "Smart Search". , 2007, Analytical chemistry.

[34]  Peter R. Baker,et al.  Role of accurate mass measurement (+/- 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. , 1999, Analytical chemistry.

[35]  Oliver Serang,et al.  Efficient Exact Maximum a Posteriori Computation for Bayesian SNP Genotyping in Polyploids , 2012, PloS one.