A novel spectral library workflow to enhance protein identifications.

The innovations in mass spectrometry-based investigations in proteome biology enable systematic characterization of molecular details in pathophysiological phenotypes. However, the process of delineating large-scale raw proteomic datasets into a biological context requires high-throughput data acquisition and processing. A spectral library search engine makes use of previously annotated experimental spectra as references for subsequent spectral analyses. This workflow delivers many advantages, including elevated analytical efficiency and specificity as well as reduced demands in computational capacity. In this study, we created a spectral matching engine to address challenges commonly associated with a library search workflow. Particularly, an improved sliding dot product algorithm, that is robust to systematic drifts of mass measurement in spectra, is introduced. Furthermore, a noise management protocol distinguishes spectra correlation attributed from noise and peptide fragments. It enables elevated separation between target spectral matches and false matches, thereby suppressing the possibility of propagating inaccurate peptide annotations from library spectra to query spectra. Moreover, preservation of original spectra also accommodates user contributions to further enhance the quality of the library. Collectively, this search engine supports reproducible data analyses using curated references, thereby broadening the accessibility of proteomics resources to biomedical investigators. This article is part of a Special Issue entitled: From protein structures to clinical applications.

[1]  John R Yates,et al.  Validation of Tandem Mass Spectrometry Database Search Results Using DTASelect , 2006, Current protocols in bioinformatics.

[2]  Brian D Halligan,et al.  Low cost, scalable proteomics data analysis using Amazon's cloud computing services and open source search algorithms. , 2009, Journal of proteome research.

[3]  John R Yates,et al.  Identification of N-terminally arginylated proteins and peptides by mass spectrometry , 2009, Nature Protocols.

[4]  E. Birney,et al.  The International Protein Index: An integrated database for proteomics experiments , 2004, Proteomics.

[5]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[6]  P. Ping,et al.  Contrasting Proteome Biology and Functional Heterogeneity of the 20 S Proteasome Complexes in Mammalian Tissues*S , 2009, Molecular & Cellular Proteomics.

[7]  J. Yates,et al.  Method to compare collision-induced dissociation spectra of peptides: potential for library searching and subtractive analysis. , 1998, Analytical chemistry.

[8]  William Stafford Noble,et al.  Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries. , 2006, Analytical chemistry.

[9]  E. Deutsch mzML: A single, unifying data format for mass spectrometer output , 2008, Proteomics.

[10]  Henry H. N. Lam,et al.  PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows , 2008, EMBO reports.

[11]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[12]  Francesco L Brancia,et al.  Recent developments in ion-trap mass spectrometry and related technologies , 2006, Expert review of proteomics.

[13]  Nichole L. King,et al.  Development and validation of a spectral library searching method for peptide identification from MS/MS , 2007, Proteomics.

[14]  Vladislav A Petyuk,et al.  Mass spectrometry for translational proteomics: progress and clinical implications , 2012, Genome Medicine.

[15]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[16]  Peter R. Baker,et al.  Mass spectrometric characterization of the affinity-purified human 26S proteasome complex. , 2007, Biochemistry.

[17]  Lennart Martens,et al.  PRIDE: new developments and new datasets , 2007, Nucleic Acids Res..

[18]  P. Ping,et al.  Mapping the Murine Cardiac 26S Proteasome Complexes , 2006, Circulation research.

[19]  R. Beavis,et al.  Using annotated peptide mass spectrum libraries for protein identification. , 2006, Journal of proteome research.

[20]  Brendan MacLean,et al.  Bioinformatics Applications Note Gene Expression Skyline: an Open Source Document Editor for Creating and Analyzing Targeted Proteomics Experiments , 2022 .

[21]  Ruedi Aebersold,et al.  Building consensus spectral libraries for peptide identification in proteomics , 2008, Nature Methods.

[22]  Peipei Ping,et al.  Mammalian Proteasome Subpopulations with Distinct Molecular Compositions and Proteolytic Activities* , 2007, Molecular & Cellular Proteomics.

[23]  J. Yates,et al.  Large-scale analysis of the yeast proteome by multidimensional protein identification technology , 2001, Nature Biotechnology.

[24]  R. Cooks,et al.  Orbitrap mass spectrometry: instrumentation, ion motion and applications. , 2008, Mass spectrometry reviews.

[25]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[26]  Barbara Frewen,et al.  High quality catalog of proteotypic peptides from human heart. , 2008, Journal of proteome research.

[27]  Peipei Ping,et al.  Proteome Dynamics and Proteome Function of Cardiac 19S Proteasomes* , 2011, Molecular & Cellular Proteomics.

[28]  Peipei Ping,et al.  Regulation of Murine Cardiac 20S Proteasomes: Role of Associating Partners , 2006, Circulation research.

[29]  Pierre Hainaut,et al.  Proteomics beyond proteomics: toward clinical applications , 2011, Current Opinion in Oncology.

[30]  Nichole L. King,et al.  Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry , 2004, Genome Biology.

[31]  Robert Burke,et al.  ProteoWizard: open source software for rapid proteomics tools development , 2008, Bioinform..

[32]  D. Creasy,et al.  Unimod: Protein modifications for mass spectrometry , 2004, Proteomics.

[33]  Michael J MacCoss,et al.  Dual-pressure linear ion trap mass spectrometer improving the analysis of complex protein mixtures. , 2009, Analytical chemistry.