A statistical method for assessing peptide identification confidence in accurate mass and time tag proteomics.

Current algorithms for quantifying peptide identification confidence in the accurate mass and time (AMT) tag approach assume that the AMT tags themselves have been correctly identified. However, there is uncertainty in the identification of AMT tags, because this is based on matching LC-MS/MS fragmentation spectra to peptide sequences. In this paper, we incorporate confidence measures for the AMT tag identifications into the calculation of probabilities for correct matches to an AMT tag database, resulting in a more accurate overall measure of identification confidence for the AMT tag approach. The method is referenced as Statistical Tools for AMT Tag Confidence (STAC). STAC additionally provides a uniqueness probability (UP) to help distinguish between multiple matches to an AMT tag and a method to calculate an overall false discovery rate (FDR). STAC is freely available for download, as both a command line and a Windows graphical application.

[1]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[2]  J. Yates,et al.  Large-scale analysis of the yeast proteome by multidimensional protein identification technology , 2001, Nature Biotechnology.

[3]  Jimmy Eng,et al.  A platform for accurate mass and time analyses of mass spectrometry data. , 2007, Journal of proteome research.

[4]  Matthew E Monroe,et al.  Probability-based evaluation of peptide and protein identifications from tandem mass spectrometry and SEQUEST analysis: the human proteome. , 2005, Journal of proteome research.

[5]  P. Pevzner,et al.  InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. , 2005, Analytical chemistry.

[6]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[7]  Navdeep Jaitly,et al.  VIPER: an advanced software package to support high-throughput LC-MS peptide identification , 2007, Bioinform..

[8]  J. Yates,et al.  A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases. , 2003, Analytical chemistry.

[9]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[10]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[11]  Navdeep Jaitly,et al.  DeconMSn: a software tool for accurate parent ion monoisotopic mass determination for tandem mass spectra , 2008, Bioinform..

[12]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[13]  Richard D. Smith,et al.  Proteome analyses using accurate mass and elution time peptide tags with capillary LC time-of-flight mass spectrometry , 2003, Journal of the American Society for Mass Spectrometry.

[14]  Gordon A Anderson,et al.  Use of artificial neural networks for the accurate prediction of peptide liquid chromatography elution times in proteome analyses. , 2003, Analytical chemistry.

[15]  A. Nesvizhskii,et al.  Experimental protein mixture for validating tandem mass spectral analysis. , 2002, Omics : a journal of integrative biology.

[16]  Salvador Martínez-Bartolomé,et al.  Statistical model for large-scale peptide identification in databases from tandem mass spectra using SEQUEST. , 2004, Analytical chemistry.

[17]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[18]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[19]  Richard D. Smith,et al.  The Utility of Accurate Mass and LC Elution Time Information in the Analysis of Complex Proteomes , 2005, Journal of the American Society for Mass Spectrometry.

[20]  Richard D. Smith,et al.  Advances in proteomics data analysis and display using an accurate mass and time tag approach. , 2006, Mass spectrometry reviews.

[21]  D. Ghosh,et al.  Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling. , 2008, Journal of proteome research.

[22]  Lang Li,et al.  A hierarchical statistical model to assess the confidence of peptides and proteins inferred from tandem mass spectrometry , 2008, Bioinform..

[23]  Lukas N. Mueller,et al.  SuperHirn – a novel tool for high resolution LC‐MS‐based peptide/protein profiling , 2007, Proteomics.

[24]  Vladislav A Petyuk,et al.  Spatial mapping of protein abundances in the mouse brain by voxelation integrated with high-throughput liquid chromatography-mass spectrometry. , 2007, Genome research.

[25]  Ljiljana Paša-Tolić,et al.  An accurate mass tag strategy for quantitative and high‐throughput proteome measurements , 2002, Proteomics.

[26]  Yan Liu,et al.  Peptide sequence confidence in accurate mass and time analysis and its use in complex proteomics experiments. , 2008, Journal of proteome research.

[27]  Richard D. Smith,et al.  Mass measurement accuracy in analyses of highly complex mixtures based upon multidimensional recalibration. , 2006, Analytical chemistry.

[28]  Richard D. Smith,et al.  Robust algorithm for alignment of liquid chromatography-mass spectrometry analyses in an accurate mass and time tag data analysis pipeline. , 2006, Analytical chemistry.

[29]  P. Pevzner,et al.  Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. , 2008, Journal of proteome research.