De Novo Sequencing of Peptides from Top-Down Tandem Mass Spectra.

De novo sequencing of proteins and peptides is one of the most important problems in mass spectrometry-driven proteomics. A variety of methods have been developed to accomplish this task from a set of bottom-up tandem (MS/MS) mass spectra. However, a more recently emerged top-down technology, now gaining more and more popularity, opens new perspectives for protein analysis and characterization, implying a need for efficient algorithms to process this kind of MS/MS data. Here, we describe a method that allows for the retrieval, from a set of top-down MS/MS spectra, of long and accurate sequence fragments of the proteins contained in the sample. To this end, we outline a strategy for generating high-quality sequence tags from top-down spectra, and introduce the concept of a T-Bruijn graph by adapting to the case of tags the notion of an A-Bruijn graph widely used in genomics. The output of the proposed approach represents the set of amino acid strings spelled out by optimal paths in the connected components of a T-Bruijn graph. We illustrate its performance on top-down data sets acquired from carbonic anhydrase 2 (CAH2) and the Fab region of alemtuzumab.

[1]  P. Pevzner,et al.  Automated de novo protein sequencing of monoclonal antibodies , 2008, Nature Biotechnology.

[2]  K. Clauser,et al.  Sequencing-grade de novo analysis of MS/MS triplets (CID/HCD/ETD) from overlapping peptides. , 2013, Journal of proteome research.

[3]  Pieter C. Dorrestein,et al.  A mass spectrometry-guided genome mining approach for natural product peptidogenomics , 2011, Nature chemical biology.

[4]  Oliver Kohlbacher,et al.  De novo peptide sequencing by tandem MS using complementary CID and electron transfer dissociation , 2009, Electrophoresis.

[5]  Pavel A. Pevzner,et al.  De Novo Peptide Sequencing via Tandem Mass Spectrometry , 1999, J. Comput. Biol..

[6]  P. Pevzner,et al.  Deconvolution and Database Search of Complex Tandem Mass Spectra of Intact Proteins , 2010, Molecular & Cellular Proteomics.

[7]  Bin Ma,et al.  Adepts: Advanced peptide de novo Sequencing with a Pair of Tandem Mass Spectra , 2010, J. Bioinform. Comput. Biol..

[8]  M. Dong,et al.  pNovo+: de novo peptide sequencing using complementary HCD and ETD tandem mass spectra. , 2013, Journal of proteome research.

[9]  Ming Li,et al.  PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[10]  Yong-Bin Kim,et al.  ProSight PTM 2.0: improved protein identification and characterization for top down mass spectrometry , 2007, Nucleic Acids Res..

[11]  P. Pevzner,et al.  De novo protein sequencing by combining top-down and bottom-up tandem mass spectra. , 2014, Journal of proteome research.

[12]  P. Dorrestein,et al.  Imaging mass spectrometry and genome mining via short sequence tagging identified the anti-infective agent arylomycin in Streptomyces roseosporus. , 2011, Journal of the American Chemical Society.

[13]  Mikhail M Savitski,et al.  New Data Base-independent, Sequence Tag-based Scoring of Peptide MS/MS Data Validates Mowse Scores, Recovers Below Threshold Data, Singles Out Modified Peptides, and Assesses the Quality of MS/MS Techniques* , 2005, Molecular & Cellular Proteomics.

[14]  F. McLafferty,et al.  Automated de novo sequencing of proteins by tandem high-resolution mass spectrometry. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[15]  V. Bafna,et al.  Template Proteogenomics: Sequencing Whole Proteins Using an Imperfect Database* , 2010, Molecular & Cellular Proteomics.

[16]  Nuno Bandeira,et al.  Shotgun Protein Sequencing , 2007, Molecular & Cellular Proteomics.

[17]  P. Pevzner,et al.  InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. , 2005, Analytical chemistry.

[18]  P. Pevzner,et al.  Shotgun protein sequencing by tandem mass spectra assembly. , 2004, Analytical chemistry.

[19]  Ying S. Ting,et al.  Protein Identification Using Top-Down Spectra* , 2012, Molecular & Cellular Proteomics.

[20]  Neil L. Kelleher,et al.  Peer Reviewed: Top-Down Proteomics , 2004 .

[21]  A. Nesvizhskii,et al.  Improved sequence tag generation method for peptide identification in tandem mass spectrometry. , 2008, Journal of proteome research.

[22]  A. Shevchenko,et al.  MultiTag: multiple error-tolerant sequence tag search for the sequence-similarity identification of proteins by mass spectrometry. , 2003, Analytical chemistry.

[23]  Nagiza F. Samatova,et al.  A high-throughput de novo sequencing approach for shotgun proteomics using high-resolution tandem mass spectrometry , 2010, BMC Bioinformatics.

[24]  Yong-Bin Kim,et al.  ProSight PTM: an integrated environment for protein identification and characterization by top-down mass spectrometry , 2004, Nucleic Acids Res..

[25]  P. Pevzner,et al.  De Novo Repeat Classification and Fragment Assembly , 2004 .

[26]  J. Yates,et al.  GutenTag: high-throughput sequence tagging via an empirically derived fragmentation model. , 2003, Analytical chemistry.

[27]  David L Tabb,et al.  DirecTag: accurate sequence tags from peptide MS/MS through statistical scoring. , 2008, Journal of proteome research.

[28]  P. Pevzner,et al.  PepNovo: de novo peptide sequencing via probabilistic network modeling. , 2005, Analytical chemistry.

[29]  M. Wilm,et al.  Error-tolerant identification of peptides in sequence databases by peptide sequence tags. , 1994, Analytical chemistry.

[30]  Pavel A. Pevzner,et al.  Peptide sequence tags for fast database search in mass-spectrometry. , 2005 .

[31]  B. Searle,et al.  High-throughput identification of proteins and unanticipated sequence modifications using a mass-based alignment algorithm for MS/MS de novo sequencing results. , 2004, Analytical chemistry.

[32]  R. Matthiesen Mass Spectrometry Data Analysis in Proteomics , 2006, Methods in Molecular Biology.

[33]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[34]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[35]  Heejin Park,et al.  Unrestrictive Identification of Multiple Post-translational Modifications from Tandem Mass Spectrometry Using an Error-tolerant Algorithm Based on an Extended Sequence Tag Approach*S , 2008, Molecular & Cellular Proteomics.

[36]  Bin Ma,et al.  Automated protein (re)sequencing with MS/MS and a homologous database yields almost full coverage and accuracy , 2009, Bioinform..

[37]  J. A. Taylor,et al.  Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. , 2001, Analytical chemistry.

[38]  J. A. Taylor,et al.  Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. , 1997, Rapid communications in mass spectrometry : RCM.

[39]  L. Paša-Tolić,et al.  An integrated top‐down and bottom‐up proteomic approach to characterize the antigen‐binding fragment of antibodies , 2014, Proteomics.

[40]  Richard D. Smith,et al.  De novo sequencing of unique sequence tags for discovery of post-translational modifications of proteins. , 2008, Analytical chemistry.