MS2PIP: a tool for MS/MS peak intensity prediction

MOTIVATION Tandem mass spectrometry provides the means to match mass spectrometry signal observations with the chemical entities that generated them. The technology produces signal spectra that contain information about the chemical dissociation pattern of a peptide that was forced to fragment using methods like collision-induced dissociation. The ability to predict these MS(2) signals and to understand this fragmentation process is important for sensitive high-throughput proteomics research. RESULTS We present a new tool called MS(2)PIP for predicting the intensity of the most important fragment ion signal peaks from a peptide sequence. MS(2)PIP pre-processes a large dataset with confident peptide-to-spectrum matches to facilitate data-driven model induction using a random forest regression learning algorithm. The intensity predictions of MS(2)PIP were evaluated on several independent evaluation sets and found to correlate significantly better with the observed fragment-ion intensities as compared with the current state-of-the-art PeptideART tool. AVAILABILITY MS(2)PIP code is available for both training and predicting at http://compomics.com/.

[1]  S. Degroeve,et al.  A reproducibility‐based evaluation procedure for quantifying the differences between MS/MS peak intensity normalization methods , 2011, Proteomics.

[2]  David L Tabb,et al.  MASPIC: intensity-based tandem mass spectrometry scoring scheme that improves peptide identification at high confidence. , 2005, Analytical chemistry.

[3]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[4]  Nichole L. King,et al.  Development and validation of a spectral library searching method for peptide identification from MS/MS , 2007, Proteomics.

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  John R Yates,et al.  Central limit theorem as an approximation for intensity-based scoring function. , 2006, Analytical chemistry.

[7]  Steven P Gygi,et al.  Intensity-based protein identification by machine learning from a library of tandem mass spectra , 2004, Nature Biotechnology.

[8]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[9]  Predrag Radivojac,et al.  A Machine Learning Approach to Predicting Peptide Fragmentation Spectra , 2005, Pacific Symposium on Biocomputing.

[10]  Zhongqi Zhang Prediction of low-energy collision-induced dissociation spectra of peptides. , 2004, Analytical chemistry.

[11]  Jianfeng Feng,et al.  A machine learning approach to explore the spectra intensity pattern of peptides using tandem mass spectrometry data , 2008, BMC Bioinformatics.

[12]  Lennart Martens,et al.  ms_lims, a simple yet powerful open source laboratory information management system for MS‐driven proteomics , 2010, Proteomics.

[13]  L. Martens,et al.  Getting intimate with trypsin, the leading protease in proteomics. , 2013, Mass spectrometry reviews.

[14]  D. Tabb,et al.  MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. , 2007, Journal of proteome research.

[15]  Predrag Radivojac,et al.  On the accuracy and limits of peptide fragmentation spectrum prediction. , 2011, Analytical chemistry.

[16]  Zhongqi Zhang,et al.  Prediction of low-energy collision-induced dissociation spectra of peptides with three or more charges. , 2005, Analytical chemistry.

[17]  R. Zahedi,et al.  Peptide identification quality control , 2011, Proteomics.

[18]  Lennart Martens,et al.  Peptizer, a Tool for Assessing False Positive Peptide Identifications and Manually Validating Selected Results*S , 2008, Molecular & Cellular Proteomics.

[19]  John C Whittaker,et al.  Review of factors that influence the abundance of ions produced in a tandem mass spectrometer and statistical methods for discovering these factors. , 2009, Mass spectrometry reviews.

[20]  Birgit Schilling,et al.  Interlaboratory Study Characterizing a Yeast Performance Standard for Benchmarking LC-MS Platform Performance* , 2009, Molecular & Cellular Proteomics.

[21]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.