Improved Protein Inference from Multiple Protease Bottom-Up Mass Spectrometry Data.

Peptides detected by tandem mass spectrometry (MS/MS) in bottom-up proteomics serve as proxies for the proteins expressed in the sample. Protein inference is a process routinely applied to these peptides to generate a plausible list of candidate protein identifications. The use of multiple proteases for parallel protein digestions expands sequence coverage, provides additional peptide identifications, and increases the probability of identifying peptides that are unique to a single protein, which are all valuable for protein inference. We have developed and implemented a multi-protease protein inference algorithm in MetaMorpheus, a bottom-up search software program, which incorporates the calculation of protease-specific q-values and preserves the association of peptide sequences and their protease of origin. This integrated multi-protease protein inference algorithm provides more accurate results than the aggregation of results from the separate analysis of the peptide identifications produced by each protease (separate approach) in MetaMorpheus, and results obtained using either Fido, ProteinProphet or DTASelect2. MetaMorpheus and its integrated multi-protease data analysis (integrated approach) decreases the ambiguity of the protein group list, reduces the frequency of erroneous identifications, and increases the number of post-translational modifications identified while providing increased efficiency by combining multi-protease search and protein inference into a single software program.

[1]  Michael R. Shortreed,et al.  Large-scale mass spectrometric detection of variant peptides resulting from nonsynonymous nucleotide differences. , 2014, Journal of proteome research.

[2]  Brian L. Frey,et al.  Global Post-Translational Modification Discovery , 2016, Journal of proteome research.

[3]  John D. Storey,et al.  Multiple Locus Linkage Analysis of Genomewide Expression in Yeast , 2005, PLoS biology.

[4]  J. Eng,et al.  Comet: An open‐source MS/MS sequence database search tool , 2013, Proteomics.

[5]  B. Searle,et al.  Improving sensitivity by probabilistically combining results from multiple MS/MS search methodologies. , 2008, Journal of proteome research.

[6]  A. Heck,et al.  Proteomics beyond trypsin , 2015, The FEBS journal.

[7]  J. Yates,et al.  DTASelect and Contrast: tools for assembling and comparing protein identifications from shotgun proteomics. , 2002, Journal of proteome research.

[8]  Cheng Chang,et al.  Using the entrapment sequence method as a standard to evaluate key steps of proteomics data analysis process , 2017, BMC Genomics.

[9]  Michael R Shortreed,et al.  Enhanced Global Post-translational Modification Discovery with MetaMorpheus. , 2018, Journal of proteome research.

[10]  Xue Wu,et al.  An Unsupervised, Model-Free, Machine-Learning Combiner for Peptide Identifications from Tandem Mass Spectra , 2009, Clinical Proteomics.

[11]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[12]  John R Yates,et al.  ProteinInferencer: Confident protein identification and multiple experiment comparison for large scale proteomics projects. , 2015, Journal of proteomics.

[13]  Zengyou He,et al.  Protein inference: a review , 2012, Briefings Bioinform..

[14]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[15]  John D. Venable,et al.  ProLuCID: An improved SEQUEST-like algorithm with enhanced sensitivity and specificity. , 2015, Journal of proteomics.

[16]  Martin Eisenacher,et al.  Protein inference using PIA workflows and PSI standard file formats , 2018, bioRxiv.

[17]  M. Mann,et al.  Universal sample preparation method for proteome analysis , 2009, Nature Methods.

[18]  William Stafford Noble,et al.  Efficient marginalization to compute protein posterior probabilities from shotgun mass spectrometry data. , 2010, Journal of proteome research.

[19]  J. Coon,et al.  Value of using multiple proteases for large-scale mass spectrometry-based proteomics. , 2010, Journal of proteome research.