Tandem Mass Spectrum Sequencing: An Alternative to Database Search Engines in Shotgun Proteomics.

Protein identification via database searches has become the gold standard in mass spectrometry based shotgun proteomics. However, as the quality of tandem mass spectra improves, direct mass spectrum sequencing gains interest as a database-independent alternative. In this chapter, the general principle of this so-called de novo sequencing is introduced along with pitfalls and challenges of the technique. The main tools available are presented with a focus on user friendly open source software which can be directly applied in everyday proteomic workflows.

[1]  D. Martinsen,et al.  Computer applications in mass spectral interpretation: A recent review , 1985 .

[2]  Lennart Martens,et al.  A global analysis of peptide fragmentation variability , 2011, Proteomics.

[3]  P. Højrup,et al.  Rapid identification of proteins by peptide-mass fingerprinting , 1993, Current Biology.

[4]  Joachim M. Buhmann,et al.  A Hidden Markov Model for de Novo Peptide Sequencing , 2004, NIPS.

[5]  K. Biemann,et al.  Computer program (SEQPEP) to aid in the interpretation of high-energy collision tandem mass spectra of peptides. , 1989, Biomedical & environmental mass spectrometry.

[6]  M. Dong,et al.  pNovo+: de novo peptide sequencing using complementary HCD and ETD tandem mass spectra. , 2013, Journal of proteome research.

[7]  Vincent J. Henry,et al.  OMICtools: an informative directory for multi-omic data analysis , 2014, Database J. Biol. Databases Curation.

[8]  J. A. Taylor,et al.  Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. , 1997, Rapid communications in mass spectrometry : RCM.

[9]  David L Tabb,et al.  DirecTag: accurate sequence tags from peptide MS/MS through statistical scoring. , 2008, Journal of proteome research.

[10]  Pavel A. Pevzner,et al.  UniNovo: a universal tool for de novo peptide sequencing , 2013, RECOMB.

[11]  Debojyoti Dutta,et al.  MSNovo: a dynamic programming algorithm for de novo peptide sequencing via tandem mass spectrometry. , 2007, Analytical chemistry.

[12]  Thilo Muth,et al.  DeNovoGUI: An Open Source Graphical User Interface for de Novo Sequencing of Tandem Mass Spectra , 2013, Journal of proteome research.

[13]  D. Tabb,et al.  TagRecon: high-throughput mutation identification through sequence tagging. , 2010, Journal of proteome research.

[14]  Martin Eisenacher,et al.  The mzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results , 2012, Molecular & Cellular Proteomics.

[15]  Lennart Martens,et al.  compomics-utilities: an open-source Java library for computational proteomics , 2011, BMC Bioinformatics.

[16]  Oliver Kohlbacher,et al.  De novo peptide sequencing by tandem MS using complementary CID and electron transfer dissociation , 2009, Electrophoresis.

[17]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Stephan M. Winkler,et al.  MS Amanda, a Universal Identification Algorithm Optimized for High Accuracy Tandem Mass Spectra , 2014, Journal of proteome research.

[19]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[20]  Christine Carapito,et al.  MSDA, a proteomics software suite for in‐depth Mass Spectrometry Data Analysis using grid computing , 2014, Proteomics.

[21]  J. Yates,et al.  GutenTag: high-throughput sequence tagging via an empirically derived fragmentation model. , 2003, Analytical chemistry.

[22]  D. Tabb,et al.  MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. , 2007, Journal of proteome research.

[23]  Ming Li,et al.  PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[24]  M. Wilm,et al.  Error-tolerant identification of peptides in sequence databases by peptide sequence tags. , 1994, Analytical chemistry.

[25]  K. Clauser,et al.  Sequencing-grade de novo analysis of MS/MS triplets (CID/HCD/ETD) from overlapping peptides. , 2013, Journal of proteome research.

[26]  Ming-Yang Kao,et al.  A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry , 2000, SODA '00.

[27]  Mark Cieliebak,et al.  AUDENS: a tool for automated peptide de novo sequencing. , 2005, Journal of proteome research.

[28]  A. Shevchenko,et al.  MultiTag: multiple error-tolerant sequence tag search for the sequence-similarity identification of proteins by mass spectrometry. , 2003, Analytical chemistry.

[29]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[30]  P. Edman,et al.  A protein sequenator. , 1967, European journal of biochemistry.

[31]  C. Watanabe,et al.  Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[33]  Michael D. Litton,et al.  IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering. , 2009, Journal of proteome research.

[34]  David Goldberg,et al.  Lookup peaks: a hybrid of de novo sequencing and database search for protein identification by tandem mass spectrometry. , 2007, Analytical chemistry.

[35]  Nagiza F. Samatova,et al.  A high-throughput de novo sequencing approach for shotgun proteomics using high-resolution tandem mass spectrometry , 2010, BMC Bioinformatics.

[36]  D. Tabb,et al.  Proteomic parsimony through bipartite graph analysis improves accuracy and transparency. , 2007, Journal of proteome research.

[37]  P. Pevzner,et al.  PepNovo: de novo peptide sequencing via probabilistic network modeling. , 2005, Analytical chemistry.

[38]  John R Yates,et al.  PepExplorer: A Similarity-driven Tool for Analyzing de Novo Sequencing Results * , 2014, Molecular & Cellular Proteomics.

[39]  Bin Ma,et al.  PEAKS DB: De Novo Sequencing Assisted Database Search for Sensitive and Accurate Peptide Identification* , 2011, Molecular & Cellular Proteomics.

[40]  G Padron,et al.  Automated interpretation of low‐energy collision‐induced dissociation spectra by SeqMS, a software aid for de novo sequencing by tandem mass spectrometry , 2000, Electrophoresis.

[41]  Lennart Martens,et al.  Shedding light on black boxes in protein identification , 2014, Proteomics.

[42]  V. Bafna,et al.  Template Proteogenomics: Sequencing Whole Proteins Using an Imperfect Database* , 2010, Molecular & Cellular Proteomics.

[43]  P. Pevzner,et al.  Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. , 2008, Journal of proteome research.

[44]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[45]  David L Tabb,et al.  Pepitome: evaluating improved spectral library search for identification complementarity and quality assessment. , 2012, Journal of proteome research.

[46]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[47]  Joshua E. Elias,et al.  Target-Decoy Search Strategy for Mass Spectrometry-Based Proteomics , 2010, Proteome Bioinformatics.

[48]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[49]  Ting Chen,et al.  A Suboptimal Algorithm for De Novo Peptide Sequencing via Tandem Mass Spectrometry , 2003, J. Comput. Biol..