The Null-Test for peptide identification algorithm in Shotgun proteomics.

The present research proposed general evaluation strategy named Null-Test for peptide identification algorithm in Shotgun proteomics. The Null-Test method based on random matching can be utilized to check whether the algorithm has a tendency to make a mistake or has potential bugs, faultiness, errors etc., and to validate the reliability of the identification algorithm. Unfortunately, none of the five famous identification software could pass the most stringent Null-Test. PatternLab had good performance in both Null-Test and routine search by making a good control on the overfitting with sound design. The fuzzy logics based method presented as another candidate strategy could pass the Null-Test and has competitive efficiency in peptide identification. Filtering the results by appropriate FDR would increase the number of discoveries in an experiment, at the cost of losing control of Type I errors. Thus, it is necessary to utilize some more stringent criteria when someone wants to design or analyze an algorithm/software. The more stringent criteria will facilitate the discovery of latent bugs, faultiness, errors etc. in the algorithm/software. It would be recommended to utilize independent search combining random database with statistics theorem to estimate the accurate FDR of the identified results. BIOLOGICAL SIGNIFICANCE In the past decades, considerable effort has been devoted to developing a sensitive algorithm for peptide identification in Shotgun proteomics. However, little attention has been paid to controlling the reliability of the identification algorithm at the design stage. The Null-Test based on random matching can be utilized to check whether the algorithm has a tendency to make a mistake or has potential bugs, faultiness, errors etc. However, it turns out that none of the five famous identification software could pass the most stringent Null-Test in the present study, which should be taken into account seriously. Accordingly, a candidate strategy based on fuzzy logics has been demonstrated the possibility that an identification algorithm can pass the Null-Test. PatternLab shows that earlier control on overfitting is valuable for designing an efficient algorithm.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Zhonghang Xia,et al.  An adaptive classification model for peptide identification , 2015, BMC Genomics.

[3]  Brian L. Frey,et al.  Global Identification of Protein Post-translational Modifications in a Single-Pass Database Search , 2015, Journal of proteome research.

[4]  Yingming Zhao,et al.  Mascot-derived false positive peptide identifications revealed by manual analysis of tandem mass spectra. , 2009, Journal of proteome research.

[5]  Guanghui Wang,et al.  Decoy methods for assessing false positives and false discovery rates in shotgun proteomics. , 2009, Analytical chemistry.

[6]  Wen Gao,et al.  pFind 2.0: a software package for peptide and protein identification via tandem mass spectrometry. , 2007, Rapid communications in mass spectrometry : RCM.

[7]  V. Reisinger,et al.  A mass spectrometry-based approach to host cell protein identification and its application in a comparability exercise. , 2014, Analytical biochemistry.

[8]  J. Shabanowitz,et al.  Analysis of Monoclonal Antibody Sequence and Post-translational Modifications by Time-controlled Proteolysis and Tandem Mass Spectrometry* , 2015, Molecular & Cellular Proteomics.

[9]  M. Mann,et al.  Andromeda: a peptide search engine integrated into the MaxQuant environment. , 2011, Journal of proteome research.

[10]  F. Eisenhaber,et al.  pkaPS: prediction of protein kinase A phosphorylation sites with the simplified kinase-substrate binding model , 2007, Biology Direct.

[11]  Yi-Kuo Yu,et al.  Robust accurate identification of peptides (RAId): deciphering MS2 data using a structured library search with de novo based statistics , 2005, Bioinform..

[12]  Ailan Guo,et al.  Immunoaffinity Enrichment and Mass Spectrometry Analysis of Protein Methylation , 2013, Molecular & Cellular Proteomics.

[13]  William Stafford Noble,et al.  Improved False Discovery Rate Estimation Procedure for Shotgun Proteomics , 2015, Journal of proteome research.

[14]  Zhonghang Xia,et al.  A weighted classification model for peptide identification , 2014, 2014 IEEE 4th International Conference on Computational Advances in Bio and Medical Sciences (ICCABS).

[15]  Jennifer A Mead,et al.  Comparison of novel decoy database designs for optimizing protein identification searches using ABRF sPRG2006 standard MS/MS data sets. , 2009, Journal of proteome research.

[16]  J. Shaffer Multiple Hypothesis Testing , 1995 .

[17]  Michael J MacCoss,et al.  A Deeper Look into Comet—Implementation and Features , 2015, Journal of The American Society for Mass Spectrometry.

[18]  J. Eng,et al.  Comet: An open‐source MS/MS sequence database search tool , 2013, Proteomics.

[19]  Yi-Kuo Yu,et al.  Calibrating E-values for MS2 database search methods , 2007, Biology Direct.

[20]  Richard G. Brereton,et al.  Chemometrics for Pattern Recognition , 2009 .

[21]  Wen Gao,et al.  Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry , 2004, Bioinform..

[22]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[23]  R. Aebersold,et al.  Dynamic Spectrum Quality Assessment and Iterative Computational Analysis of Shotgun Proteomic Data , 2006, Molecular & Cellular Proteomics.

[24]  M. Mann,et al.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification , 2008, Nature Biotechnology.

[25]  Yi-Kuo Yu,et al.  RAId_DbS: Peptide Identification using Database Searches with Realistic Statistics , 2007, Biology Direct.

[26]  Quanhu Sheng,et al.  A Bayesian Approach to Protein Inference Problem in Shotgun Proteomics , 2008, RECOMB.

[27]  Predrag Radivojac,et al.  Computational approaches to protein inference in shotgun proteomics , 2012, BMC Bioinformatics.

[28]  Lan Huang,et al.  Comprehensive Analysis of a Multidimensional Liquid Chromatography Mass Spectrometry Dataset Acquired on a Quadrupole Selecting, Quadrupole Collision Cell, Time-of-flight Mass Spectrometer , 2005, Molecular & Cellular Proteomics.

[29]  Pedro Navarro,et al.  A refined method to calculate false discovery rates for peptide identification using decoy databases. , 2009, Journal of proteome research.

[30]  J. Coon,et al.  A proteomics search algorithm specifically designed for high-resolution tandem mass spectra. , 2013, Journal of proteome research.

[31]  R. Beavis,et al.  A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. , 2003, Analytical chemistry.

[32]  Pengyuan Yang,et al.  Finding Missing Proteins from the Epigenetically Manipulated Human Cell with Stringent Quality Criteria. , 2015, Journal of proteome research.

[33]  Aleksey Y. Ogurtsov,et al.  RAId_aPS: MS/MS Analysis with Multiple Scoring Functions and Spectrum-Specific Statistics , 2008, PloS one.

[34]  John R Yates,et al.  Can the false‐discovery rate be misleading? , 2011, Proteomics.

[35]  Sivanesan Dakshanamurthy,et al.  Big data: the next frontier for innovation in therapeutics and healthcare , 2014, Expert review of clinical pharmacology.

[36]  John R Yates,et al.  Integrated analysis of shotgun proteomic data with PatternLab for proteomics 4.0 , 2015, Nature Protocols.

[37]  Hyungwon Choi,et al.  Adaptive discriminant function analysis and reranking of MS/MS database search results for improved peptide identification in shotgun proteomics. , 2008, Journal of proteome research.

[38]  A. Nesvizhskii A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. , 2010, Journal of proteomics.

[39]  John R Yates,et al.  Search engine processor: Filtering and organizing peptide spectrum matches , 2012, Proteomics.