Application of de Novo Sequencing to Large-Scale Complex Proteomics Data Sets.

Dependent on concise, predefined protein sequence databases, traditional search algorithms perform poorly when analyzing mass spectra derived from wholly uncharacterized protein products. Conversely, de novo peptide sequencing algorithms can interpret mass spectra without relying on reference databases. However, such algorithms have been difficult to apply to complex protein mixtures, in part due to a lack of methods for automatically validating de novo sequencing results. Here, we present novel metrics for benchmarking de novo sequencing algorithm performance on large-scale proteomics data sets and present a method for accurately calibrating false discovery rates on de novo results. We also present a novel algorithm (LADS) that leverages experimentally disambiguated fragmentation spectra to boost sequencing accuracy and sensitivity. LADS improves sequencing accuracy on longer peptides relative to that of other algorithms and improves discriminability of correct and incorrect sequences. Using these advancements, we demonstrate accurate de novo identification of peptide sequences not identifiable using database search-based approaches.

[1]  Edith D. Wong,et al.  Saccharomyces Genome Database: the genomics resource of budding yeast , 2011, Nucleic Acids Res..

[2]  S. Pollard,et al.  Evaluation of inflammatory effects of airborne endotoxin emitted from composting sources , 2011, Environmental toxicology and chemistry.

[3]  M. Dong,et al.  pNovo+: de novo peptide sequencing using complementary HCD and ETD tandem mass spectra. , 2013, Journal of proteome research.

[4]  Ming Li,et al.  PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. , 2003, Rapid communications in mass spectrometry : RCM.

[5]  Albert J R Heck,et al.  Toward full peptide sequence coverage by dual fragmentation combining electron-transfer and higher-energy collision dissociation tandem mass spectrometry. , 2012, Analytical chemistry.

[6]  F. Breinig,et al.  Dissecting toxin immunity in virus-infected killer yeast uncovers an intrinsic strategy of self-protection. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[7]  M. Mann,et al.  System-wide Perturbation Analysis with Nearly Complete Coverage of the Yeast Proteome by Single-shot Ultra HPLC Runs on a Bench Top Orbitrap* , 2011, Molecular & Cellular Proteomics.

[8]  S. Ben-Dor,et al.  Improving transcriptome construction in non-model organisms: integrating manual and automated gene definition in Emiliania huxleyi , 2014, BMC Genomics.

[9]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[10]  A. Makarov,et al.  The Orbitrap: a new mass spectrometer. , 2005, Journal of mass spectrometry : JMS.

[11]  S. Mukherji,et al.  Characterization and proinflammatory response of airborne biological particles from wastewater treatment plants. , 2011, Environmental science & technology.

[12]  Christina M. Jones,et al.  Metabolomics and proteomics reveal impacts of chemically mediated competition on marine plankton , 2014, Proceedings of the National Academy of Sciences.

[13]  Sean L Seymour,et al.  The Paragon Algorithm, a Next Generation Search Engine That Uses Sequence Temperature Values and Feature Probabilities to Identify Peptides from Tandem Mass Spectra*S , 2007, Molecular & Cellular Proteomics.

[14]  Albert J R Heck,et al.  Straightforward ladder sequencing of peptides using a Lys-N metalloendopeptidase , 2008, Nature Methods.

[15]  P. Pevzner,et al.  PepNovo: de novo peptide sequencing via probabilistic network modeling. , 2005, Analytical chemistry.

[16]  Andrew Emili,et al.  De novo peptide sequencing and quantitative profiling of complex protein mixtures using mass-coded abundance tagging , 2002, Nature Biotechnology.

[17]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[18]  N. Shastri,et al.  Non-conventional sources of peptides presented by MHC class I , 2011, Cellular and Molecular Life Sciences.

[19]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[20]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[21]  Oliver Kohlbacher,et al.  De novo peptide sequencing by tandem MS using complementary CID and electron transfer dissociation , 2009, Electrophoresis.

[22]  Andrew R. Jones,et al.  ProteomeXchange provides globally co-ordinated proteomics data submission and dissemination , 2014, Nature Biotechnology.

[23]  Pavel A. Pevzner,et al.  De Novo Peptide Sequencing via Tandem Mass Spectrometry , 1999, J. Comput. Biol..

[24]  J. Yates,et al.  Protein analysis by shotgun/bottom-up proteomics. , 2013, Chemical reviews.

[25]  Bin Ma,et al.  Adepts: Advanced peptide de novo Sequencing with a Pair of Tandem Mass Spectra , 2010, J. Bioinform. Comput. Biol..

[26]  B. Searle,et al.  A Face in the Crowd: Recognizing Peptides Through Database Search* , 2011, Molecular & Cellular Proteomics.

[27]  Marshall W. Bern,et al.  Spectrum Fusion: Using Multiple Mass Spectra for De Novo Peptide Sequencing , 2008, RECOMB.

[28]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[29]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[30]  Liang Li,et al.  Differential dimethyl labeling of N-termini of peptides after guanidination for proteome analysis. , 2005, Journal of proteome research.

[31]  Yan Fu,et al.  pNovo: de novo peptide sequencing and identification using HCD spectra. , 2010, Journal of proteome research.

[32]  M. Mann,et al.  Stable Isotope Labeling by Amino Acids in Cell Culture, SILAC, as a Simple and Accurate Approach to Expression Proteomics* , 2002, Molecular & Cellular Proteomics.

[33]  Steven P Gygi,et al.  The SCX/IMAC enrichment approach for global phosphorylation analysis by mass spectrometry , 2008, Nature Protocols.

[34]  Lennart Martens,et al.  A guide to the Proteomics Identifications Database proteomics data repository , 2009, Proteomics.

[35]  M. Westphall,et al.  Neutron-encoded Signatures Enable Product Ion Annotation From Tandem Mass Spectra* , 2013, Molecular & Cellular Proteomics.

[36]  Challenges and prospects of proteomics of non-model organisms. , 2014, Journal of proteomics.

[37]  Bin Ma,et al.  PEAKS DB: De Novo Sequencing Assisted Database Search for Sensitive and Accurate Peptide Identification* , 2011, Molecular & Cellular Proteomics.

[38]  D. Craik,et al.  Protease-catalysed protein splicing: a new post-translational modification? , 2008, Trends in biochemical sciences.

[39]  K. Gevaert,et al.  Protein identification methods in proteomics , 2000, Electrophoresis.

[40]  Byung-Kwan Cho,et al.  Current Challenges in Bacterial Transcriptomics , 2013, Genomics & informatics.

[41]  Gennifer E. Merrihew,et al.  Proteogenomic database construction driven from large scale RNA-seq data. , 2014, Journal of proteome research.

[42]  M. Savitski,et al.  Proteomics-grade de novo sequencing approach. , 2005, Journal of proteome research.

[43]  N. Edwards,et al.  Novel peptide identification from tandem mass spectra using ESTs and sequence database compression , 2007, Molecular systems biology.

[44]  Joshua E. Elias,et al.  Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. , 2003, Journal of proteome research.

[45]  N. Trier,et al.  Production and characterization of peptide antibodies. , 2012, Methods.

[46]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[47]  Fang-Xiang Wu,et al.  Features-Based Deisotoping Method for Tandem Mass Spectra , 2012, Adv. Bioinformatics.

[48]  F. Bastida,et al.  Metaproteomics of soils from semiarid environment: functional and phylogenetic information obtained with different protein extraction methods. , 2014, Journal of proteomics.

[49]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[50]  M. Mann,et al.  Higher-energy C-trap dissociation for peptide modification analysis , 2007, Nature Methods.

[51]  W. Lehmann,et al.  De novo sequencing of peptides by MS/MS , 2010, Proteomics.

[52]  David Fenyö,et al.  Rapid sensitive analysis of cysteine rich peptide venom components , 2009, Proceedings of the National Academy of Sciences.

[53]  Z. Modrušan,et al.  Predicting immunogenic tumour mutations by combining mass spectrometry and exome sequencing , 2014, Nature.

[54]  Yong J. Kil,et al.  Byonic: Advanced Peptide and Protein Identification Software , 2012, Current protocols in bioinformatics.

[55]  Jens Allmer,et al.  Algorithms for the de novo sequencing of peptides from tandem mass spectra , 2011, Expert review of proteomics.