Large improvements in MS/MS-based peptide identification rates using a hybrid analysis.

We report a hybrid search method combining database and spectral library searches that allows for a straightforward approach to characterizing the error rates from the combined data. Using these methods, we demonstrate significantly increased sensitivity and specificity in matching peptides to tandem mass spectra. The hybrid search method increased the number of spectra that can be assigned to a peptide in a global proteomics study by 57-147% at an estimated false discovery rate of 5%, with clear room for even greater improvements. The approach combines the general utility of using consensus model spectra typical of database search methods with the accuracy of the intensity information contained in spectral libraries. A common scoring metric based on recent developments linking data analysis and statistical thermodynamics is used, which allows the use of a conservative estimate of error rates for the combined data. We applied this approach to proteomics analysis of Synechococcus sp. PCC 7002, a cyanobacterium that is a model organism for studies of photosynthetic carbon fixation and biofuels development. The increased specificity and sensitivity of this approach allowed us to identify many more peptides involved in the processes important for photoautotrophic growth.

[1]  William Stafford Noble,et al.  Efficient marginalization to compute protein posterior probabilities from shotgun mass spectrometry data. , 2010, Journal of proteome research.

[2]  Z. Smilansky,et al.  Intensity-based statistical scorer for tandem mass spectrometry. , 2003, Analytical chemistry.

[3]  William Stafford Noble,et al.  Semi-supervised learning for peptide identification from shotgun proteomics datasets , 2007, Nature Methods.

[4]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[5]  J. Yates,et al.  Statistical characterization of ion trap tandem mass spectra from doubly charged tryptic peptides. , 2003, Analytical chemistry.

[6]  Quanhu Sheng,et al.  A Bayesian Approach to Protein Inference Problem in Shotgun Proteomics , 2008, RECOMB.

[7]  Bobbie-Jo M. Webb-Robertson,et al.  Current trends in computational inference from mass spectrometry-based proteomics , 2007, Briefings Bioinform..

[8]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[9]  Naryttza N. Diaz,et al.  The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes , 2005, Nucleic acids research.

[10]  James P. Reilly,et al.  A computational approach toward label-free protein quantification using predicted peptide detectability , 2006, ISMB.

[11]  Stephen J. Callister,et al.  Comparison of aerobic and photosynthetic Rhodobacter sphaeroides 2.4.1 proteomes. , 2006, Journal of microbiological methods.

[12]  William R. Cannon,et al.  Physicochemical/Thermodynamic Framework for the Interpretation of Peptide Tandem Mass Spectra† , 2010 .

[13]  Daniel B. Martin,et al.  Computational prediction of proteotypic peptides for quantitative proteomics , 2007, Nature Biotechnology.

[14]  Alejandro Heredia-Langner,et al.  Comparison of probability and likelihood models for peptide identification from tandem mass spectrometry data. , 2005, Journal of proteome research.

[15]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[16]  J. Yates,et al.  Method to compare collision-induced dissociation spectra of peptides: potential for library searching and subtractive analysis. , 1998, Analytical chemistry.

[17]  Matthew E Monroe,et al.  Validation of Shewanella oneidensis MR-1 small proteins by AMT tag-based proteome analysis. , 2004, Omics : a journal of integrative biology.

[18]  Navdeep Jaitly,et al.  DeconMSn: a software tool for accurate parent ion monoisotopic mass determination for tandem mass spectra , 2008, Bioinform..

[19]  A. Emili,et al.  Interaction network containing conserved and essential protein complexes in Escherichia coli , 2005, Nature.

[20]  R. Aebersold,et al.  Scoring proteomes with proteotypic peptide probes , 2005, Nature Reviews Molecular Cell Biology.

[21]  William Stafford Noble,et al.  qvality: non-parametric estimation of q-values and posterior error probabilities , 2009, Bioinform..

[22]  Vincent J. Denef,et al.  Strain-resolved community proteomics reveals recombining genomes of acidophilic bacteria , 2007, Nature.

[23]  Ilan Beer,et al.  Improving large‐scale proteomics by clustering of mass spectrometry data , 2004, Proteomics.

[24]  A. Masselot,et al.  High‐performance peptide identification by tandem mass spectrometry allows reliable automatic data processing in proteomics , 2004, Proteomics.

[25]  Philip Hugenholtz,et al.  Proteome insights into the symbiotic relationship between a captive colony of Nasutitermes corniger and its hindgut microbiome , 2011, The ISME Journal.

[26]  P. Bork,et al.  Functional organization of the yeast proteome by systematic analysis of protein complexes , 2002, Nature.

[27]  B. Cravatt,et al.  Activity-based protein profiling: the serine hydrolases. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Stephen J. Callister,et al.  Loss of the Response Regulator CtrA Causes Pleiotropic Effects on Gene Expression but Does Not Affect Growth Phase Regulation in Rhodobacter capsulatus , 2010, Journal of bacteriology.

[29]  Edward M Marcotte,et al.  How do shotgun proteomics algorithms identify proteins? , 2007, Nature Biotechnology.

[30]  Pavel A. Pevzner,et al.  De Novo Peptide Sequencing via Tandem Mass Spectrometry , 1999, J. Comput. Biol..

[31]  Frederique Lisacek,et al.  A simple workflow to increase MS2 identification rate by subsequent spectral library search , 2009, Proteomics.

[32]  R. Beavis,et al.  Using annotated peptide mass spectrum libraries for protein identification. , 2006, Journal of proteome research.

[33]  Christopher S. Oehmen,et al.  A support vector machine model for the prediction of proteotypic peptides for accurate mass and time proteomics , 2008, Bioinform..