ProteinInferencer: Confident protein identification and multiple experiment comparison for large scale proteomics projects.

Shotgun proteomics generates valuable information from large-scale and target protein characterizations, including protein expression, protein quantification, protein post-translational modifications (PTMs), protein localization, and protein-protein interactions. Typically, peptides derived from proteolytic digestion, rather than intact proteins, are analyzed by mass spectrometers because peptides are more readily separated, ionized and fragmented. The amino acid sequences of peptides can be interpreted by matching the observed tandem mass spectra to theoretical spectra derived from a protein sequence database. Identified peptides serve as surrogates for their proteins and are often used to establish what proteins were present in the original mixture and to quantify protein abundance. Two major issues exist for assigning peptides to their originating protein. The first issue is maintaining a desired false discovery rate (FDR) when comparing or combining multiple large datasets generated by shotgun analysis and the second issue is properly assigning peptides to proteins when homologous proteins are present in the database. Herein we demonstrate a new computational tool, ProteinInferencer, which can be used for protein inference with both small- or large-scale data sets to produce a well-controlled protein FDR. In addition, ProteinInferencer introduces confidence scoring for individual proteins, which makes protein identifications evaluable. This article is part of a Special Issue entitled: Computational Proteomics.

[1]  William Stafford Noble,et al.  Efficient marginalization to compute protein posterior probabilities from shotgun mass spectrometry data. , 2010, Journal of proteome research.

[2]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[3]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[4]  Daniel B. McClatchy,et al.  Dynamics of subcellular proteomes during brain development. , 2012, Journal of proteome research.

[5]  Steven P Gygi,et al.  Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry , 2007, Nature Methods.

[6]  J. Buhmann,et al.  Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry* , 2009, Molecular & Cellular Proteomics.

[7]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[8]  Yong J. Kil,et al.  Two-dimensional target decoy strategy for shotgun proteomics. , 2011, Journal of proteome research.

[9]  John R Yates,et al.  Validation of Tandem Mass Spectrometry Database Search Results Using DTASelect , 2006, Current protocols in bioinformatics.

[10]  Alexey I Nesvizhskii,et al.  Interpretation of Shotgun Proteomic Data , 2005, Molecular & Cellular Proteomics.

[11]  John R Yates,et al.  Shotgun protein identification and quantification by mass spectrometry. , 2009, Methods in molecular biology.

[12]  Marshall W. Bern,et al.  Improved Ranking Functions for Protein and Modification-Site Identifications , 2007, RECOMB.

[13]  John D. Venable,et al.  ProLuCID: An improved SEQUEST-like algorithm with enhanced sensitivity and specificity. , 2015, Journal of proteomics.

[14]  J. Yates,et al.  Large-scale analysis of the yeast proteome by multidimensional protein identification technology , 2001, Nature Biotechnology.

[15]  John R Yates,et al.  Search engine processor: Filtering and organizing peptide spectrum matches , 2012, Proteomics.

[16]  J. Yates,et al.  Protein analysis by shotgun/bottom-up proteomics. , 2013, Chemical reviews.

[17]  J. Yates,et al.  DTASelect and Contrast: tools for assembling and comparing protein identifications from shotgun proteomics. , 2002, Journal of proteome research.

[18]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[19]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[20]  Gary D Bader,et al.  A draft map of the human proteome , 2014, Nature.

[21]  B. Kuster,et al.  Mass-spectrometry-based draft of the human proteome , 2014, Nature.

[22]  J. Buhmann,et al.  Generic Comparison of Protein Inference Engines* , 2011, Molecular & Cellular Proteomics.

[23]  D. Tabb,et al.  Proteomic parsimony through bipartite graph analysis improves accuracy and transparency. , 2007, Journal of proteome research.

[24]  D. B. Weatherly,et al.  A Heuristic Method for Assigning a False-discovery Rate for Protein Identifications from Mascot Database Search Results * , 2005, Molecular & Cellular Proteomics.

[25]  J. Yates,et al.  Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. , 1995, Analytical chemistry.

[26]  Alexey I Nesvizhskii,et al.  Protein identification by tandem mass spectrometry and sequence database searching. , 2007, Methods in molecular biology.

[27]  John R Yates,et al.  Top down proteomics. , 2013, Analytical chemistry.

[28]  K. Resing,et al.  IsoformResolver: A Peptide-Centric Algorithm for Protein Inference , 2011, Journal of proteome research.

[29]  John R Yates,et al.  The proteomes of human parotid and submandibular/sublingual gland salivas collected as the ductal secretions. , 2008, Journal of proteome research.

[30]  P. Pevzner,et al.  False discovery rates of protein identifications: a strike against the two-peptide rule. , 2009, Journal of proteome research.

[31]  B. Searle Scaffold: A bioinformatic tool for validating MS/MS‐based proteomic studies , 2010, Proteomics.

[32]  D. Naiman,et al.  Probability model for assessing proteins assembled from peptide sequences inferred from tandem mass spectrometry data. , 2007, Analytical chemistry.

[33]  Michael D. Litton,et al.  IDPicker 2.0: Improved protein assembly with high discrimination peptide identification filtering. , 2009, Journal of proteome research.