Using the entrapment sequence method as a standard to evaluate key steps of proteomics data analysis process

BackgroundThe mass spectrometry based technical pipeline has provided a high-throughput, high-sensitivity and high-resolution platform for post-genomic biology. Varied models and algorithms are implemented by different tools to improve proteomics data analysis. The target-decoy searching strategy has become the most popular strategy to control false identification in peptide and protein identifications. While this strategy can estimate the false discovery rate (FDR) within a dataset, it cannot directly evaluate the false positive matches in target identifications.ResultsAs a supplement to target-decoy strategy, the entrapment sequence method was introduced to assess the key steps of mass spectrometry data analysis process, database search engines and quality control methods. Using the entrapment sequences as the standard, we evaluated five database search engines for both the origanal scores and reprocessed scores, as well as four quality control methods in term of quantity and quality aspects. Our results showed that the latest developed search engine MS-GF+ and percolator-embeded quality control method PepDistiller performed best in all tools respectively. Combined with efficient quality control methods, the search engines can improve the low sensitivity of their original scores. Moreover, based on the entrapment sequence method, we proved that filtering the identifications separately could increase the number of identified peptides while improving the confidence level.ConclusionIn this study, we have proved that the entrapment sequence method could be an useful strategy to assess the key steps of the mass spectrometry data analysis process. Its applications can be extended to all steps of the common workflow, such as the protein assembling methods and data integration methods.

[1]  William Stafford Noble,et al.  Rapid and accurate peptide identification from tandem mass spectra. , 2008, Journal of proteome research.

[2]  Hyungwon Choi,et al.  Semisupervised model-based validation of peptide identifications in mass spectrometry-based proteomics. , 2008, Journal of proteome research.

[3]  Quanhu Sheng,et al.  Optimization of Search Engines and Postprocessing Approaches to Maximize Peptide and Protein Identification for High-Resolution Mass Data. , 2015, Journal of proteome research.

[4]  William Stafford Noble,et al.  On using samples of known protein content to assess the statistical calibration of scores assigned to peptide-spectrum matches in shotgun proteomics. , 2011, Journal of proteome research.

[5]  Reinout Raijmakers,et al.  RockerBox: analysis and filtering of massive proteomics search results. , 2011, Journal of proteome research.

[6]  Natalie I. Tasman,et al.  A guided tour of the Trans‐Proteomic Pipeline , 2010, Proteomics.

[7]  Stephan M. Winkler,et al.  MS Amanda, a Universal Identification Algorithm Optimized for High Accuracy Tandem Mass Spectra , 2014, Journal of proteome research.

[8]  Jie Ma,et al.  Bayesian Nonparametric Model for the Validation of Peptide Identification in Shotgun Proteomics*S , 2009, Molecular & Cellular Proteomics.

[9]  Hyungwon Choi,et al.  Adaptive discriminant function analysis and reranking of MS/MS database search results for improved peptide identification in shotgun proteomics. , 2008, Journal of proteome research.

[10]  William Stafford Noble,et al.  Semi-supervised learning for peptide identification from shotgun proteomics datasets , 2007, Nature Methods.

[11]  R. Zeng,et al.  BuildSummary: using a group-based approach to improve the sensitivity of peptide/protein identification in shotgun proteomics. , 2012, Journal of proteome research.

[12]  Robert Burke,et al.  ProteoWizard: open source software for rapid proteomics tools development , 2008, Bioinform..

[13]  Cheng Chang,et al.  First proteomic exploration of protein-encoding genes on chromosome 1 in human liver, stomach, and colon. , 2013, Journal of proteome research.

[14]  Andrew R Jones,et al.  FDRAnalysis: a tool for the integrated analysis of tandem mass spectrometry identification results from multiple search engines. , 2011, Journal of proteome research.

[15]  Zhonghang Xia,et al.  $\boldsymbol{\ell_2}$ Multiple Kernel Fuzzy SVM-Based Data Fusion for Improving Peptide Identification , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  J. Eng,et al.  Comet: An open‐source MS/MS sequence database search tool , 2013, Proteomics.

[17]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[18]  Lennart Martens,et al.  A complex standard for protein identification, designed by evolution. , 2012, Journal of proteome research.

[19]  J. Buhmann,et al.  Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry* , 2009, Molecular & Cellular Proteomics.

[20]  Jie Ma,et al.  Improving the sensitivity of MASCOT search results validation by combining new features with Bayesian nonparametric model , 2010, Proteomics.

[21]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[22]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[23]  Erik Sjölund,et al.  Fast and accurate database searches with MS-GF+Percolator. , 2014, Journal of proteome research.

[24]  William Stafford Noble,et al.  Faster SEQUEST searching for peptide identification from tandem mass spectra. , 2011, Journal of proteome research.

[25]  Pavel A. Pevzner,et al.  Universal database search tool for proteomics , 2014, Nature Communications.

[26]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[27]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[28]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[29]  Liwei Li,et al.  PepDistiller: A quality control tool to improve the sensitivity and accuracy of peptide identifications in shotgun proteomics , 2012, Proteomics.

[30]  William Stafford Noble,et al.  Crux: Rapid Open Source Protein Tandem Mass Spectrometry Analysis , 2014, Journal of proteome research.