Pepitome: evaluating improved spectral library search for identification complementarity and quality assessment.

Spectral libraries have emerged as a viable alternative to protein sequence databases for peptide identification. These libraries contain previously detected peptide sequences and their corresponding tandem mass spectra (MS/MS). Search engines can then identify peptides by comparing experimental MS/MS scans to those in the library. Many of these algorithms employ the dot product score for measuring the quality of a spectrum-spectrum match (SSM). This scoring system does not offer a clear statistical interpretation and ignores fragment ion m/z discrepancies in the scoring. We developed a new spectral library search engine, Pepitome, which employs statistical systems for scoring SSMs. Pepitome outperformed the leading library search tool, SpectraST, when analyzing data sets acquired on three different mass spectrometry platforms. We characterized the reliability of spectral library searches by confirming shotgun proteomics identifications through RNA-Seq data. Applying spectral library and database searches on the same sample revealed their complementary nature. Pepitome identifications enabled the automation of quality analysis and quality control (QA/QC) for shotgun proteomics data acquisition pipelines.

[1]  Douglas J. Baxter,et al.  Large improvements in MS/MS-based peptide identification rates using a hybrid analysis. , 2011, Journal of proteome research.

[2]  S. Gygi,et al.  Automation of nanoscale microcapillary liquid chromatography-tandem mass spectrometry with a vented column. , 2002, Analytical chemistry.

[3]  Ruedi Aebersold,et al.  Using Spectral Libraries for Peptide Identification from Tandem Mass Spectrometry (MS/MS) Data , 2010, Current protocols in protein science.

[4]  David L. Tabb,et al.  Performance Metrics for Liquid Chromatography-Tandem Mass Spectrometry Systems in Proteomics Analyses* , 2009, Molecular & Cellular Proteomics.

[5]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[6]  David Fenyö,et al.  Mass spectrometric protein identification using the global proteome machine. , 2010, Methods in molecular biology.

[7]  Robin Kirschbaum,et al.  Questions and answers , 2009, Diabetes, obesity & metabolism.

[8]  Christopher R Kinsinger,et al.  Analytical validation of protein-based multiplex assays: a workshop report by the NCI-FDA interagency oncology task force on molecular diagnostics. , 2010, Clinical chemistry.

[9]  D. Tabb,et al.  Proteomic parsimony through bipartite graph analysis improves accuracy and transparency. , 2007, Journal of proteome research.

[10]  D. Tabb,et al.  TagRecon: high-throughput mutation identification through sequence tagging. , 2010, Journal of proteome research.

[11]  J. Yates,et al.  Method to compare collision-induced dissociation spectra of peptides: potential for library searching and subtractive analysis. , 1998, Analytical chemistry.

[12]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[13]  Cathy H. Wu,et al.  Protein sequence databases. , 2004, Current opinion in chemical biology.

[14]  Martin Eisenacher,et al.  mzIdentML: an open community-built standard format for the results of proteomics spectrum identification algorithms. , 2011, Methods in molecular biology.

[15]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[16]  Michael J MacCoss,et al.  Using BiblioSpec for Creating and Searching Tandem MS Peptide Libraries , 2007, Current protocols in bioinformatics.

[17]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[18]  Ruedi Aebersold,et al.  Building consensus spectral libraries for peptide identification in proteomics , 2008, Nature Methods.

[19]  Bing Zhang,et al.  Protein identification using customized protein sequence databases derived from RNA-Seq data. , 2012, Journal of proteome research.

[20]  Chia-Yu Yen,et al.  Spectrum-to-Spectrum Searching Using a Proteome-wide Spectral Library* , 2011, Molecular & Cellular Proteomics.

[21]  D. Tabb,et al.  MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. , 2007, Journal of proteome research.

[22]  Robert Burke,et al.  ProteoWizard: open source software for rapid proteomics tools development , 2008, Bioinform..

[23]  David L Tabb,et al.  Sequence tagging reveals unexpected modifications in toxicoproteomics. , 2011, Chemical research in toxicology.

[24]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[25]  R. Beavis,et al.  Using annotated peptide mass spectrum libraries for protein identification. , 2006, Journal of proteome research.

[26]  R. Aebersold,et al.  A uniform proteomics MS/MS analysis platform utilizing open XML file formats , 2005, Molecular systems biology.

[27]  Michael D. Litton,et al.  IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering. , 2009, Journal of proteome research.

[28]  Eric W Deutsch Tandem mass spectrometry spectral libraries and library searching. , 2011, Methods in molecular biology.

[29]  Birgit Schilling,et al.  Repeatability and reproducibility in proteomic identifications by liquid chromatography-tandem mass spectrometry. , 2010, Journal of proteome research.

[30]  Nichole L. King,et al.  Development and validation of a spectral library searching method for peptide identification from MS/MS , 2007, Proteomics.

[31]  Birgit Schilling,et al.  ScanRanker: Quality assessment of tandem mass spectra via sequence tagging. , 2011, Journal of proteome research.

[32]  Ruedi Aebersold,et al.  Artificial decoy spectral libraries for false discovery rate estimation in spectral library searching in proteomics. , 2010, Journal of proteome research.