A machine learning approach to explore the spectra intensity pattern of peptides using tandem mass spectrometry data

BackgroundA better understanding of the mechanisms involved in gas-phase fragmentation of peptides is essential for the development of more reliable algorithms for high-throughput protein identification using mass spectrometry (MS). Current methodologies depend predominantly on the use of derived m/z values of fragment ions, and, the knowledge provided by the intensity information present in MS/MS spectra has not been fully exploited. Indeed spectrum intensity information is very rarely utilized in the algorithms currently in use for high-throughput protein identification.ResultsIn this work, a Bayesian neural network approach is employed to analyze ion intensity information present in 13878 different MS/MS spectra. The influence of a library of 35 features on peptide fragmentation is examined under different proton mobility conditions. Useful rules involved in peptide fragmentation are found and subsets of features which have significant influence on fragmentation pathway of peptides are characterised. An intensity model is built based on the selected features and the model can make an accurate prediction of the intensity patterns for given MS/MS spectra. The predictions include not only the mean values of spectra intensity but also the variances that can be used to tolerate noises and system biases within experimental MS/MS spectra.ConclusionThe intensity patterns of fragmentation spectra are informative and can be used to analyze the influence of various characteristics of fragmented peptides on their fragmentation pathway. The features with significant influence can be used in turn to predict spectra intensities. Such information can help develop more reliable algorithms for peptide and protein identification.

[1]  John R Yates,et al.  Influence of basic residue content on fragment ion peak intensities in low-energy collision-induced dissociation spectra of peptides. , 2004, Analytical chemistry.

[2]  A. Nesvizhskii,et al.  Experimental protein mixture for validating tandem mass spectral analysis. , 2002, Omics : a journal of integrative biology.

[3]  V. Wysocki,et al.  Mobile and localized protons: a framework for understanding peptide dissociation. , 2000, Journal of mass spectrometry : JMS.

[4]  Steven P Gygi,et al.  Intensity-based protein identification by machine learning from a library of tandem mass spectra , 2004, Nature Biotechnology.

[5]  Marshall W. Bern,et al.  Automatic Quality Assessment of Peptide Tandem Mass Spectra , 2004, ISMB/ECCB.

[6]  T. Speed,et al.  Deriving statistical models for predicting peptide tandem MS product ion intensities. , 2003, Biochemical Society transactions.

[7]  K. Stühler,et al.  Evaluation of algorithms for protein identification from sequence databases using mass spectrometry data , 2004, Proteomics.

[8]  John R Yates,et al.  Cleavage N-terminal to proline: analysis of a database of peptide tandem mass spectra. , 2003, Analytical chemistry.

[9]  T. Haystead,et al.  Molecular Biologist's Guide to Proteomics , 2002, Microbiology and Molecular Biology Reviews.

[10]  J. Yates,et al.  A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases. , 2003, Analytical chemistry.

[11]  Eugene A. Kapp,et al.  Mining a tandem mass spectrometry database to determine the trends and global factors influencing peptide fragmentation. , 2003, Analytical chemistry.

[12]  Vicki H. Wysocki,et al.  Influence of Secondary Structure on the Fragmentation of Protonated Peptides , 1999 .

[13]  R. Beavis,et al.  A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. , 2003, Analytical chemistry.

[14]  K. Cios,et al.  Improved Validation of Peptide MS/MS Assignments Using Spectral Intensity Prediction*S , 2007, Molecular & Cellular Proteomics.

[15]  Z. Smilansky,et al.  Intensity-based statistical scorer for tandem mass spectrometry. , 2003, Analytical chemistry.

[16]  George C Tseng,et al.  Statistical characterization of the charge state and residue dependence of low-energy CID peptide dissociation patterns. , 2005, Analytical chemistry.

[17]  Vicki H. Wysocki,et al.  Influence of Peptide Composition, Gas-Phase Basicity, and Chemical Modification on Fragmentation Efficiency: Evidence for the Mobile Proton Model , 1996 .

[18]  Zhongqi Zhang Prediction of low-energy collision-induced dissociation spectra of peptides. , 2004, Analytical chemistry.

[19]  David L Tabb,et al.  MASPIC: intensity-based tandem mass spectrometry scoring scheme that improves peptide identification at high confidence. , 2005, Analytical chemistry.

[20]  J. Yates,et al.  Statistical characterization of ion trap tandem mass spectra from doubly charged tryptic peptides. , 2003, Analytical chemistry.

[21]  J. Futrell,et al.  A mechanistic investigation of the enhanced cleavage at histidine in the gas-phase dissociation of protonated peptides. , 2004, Analytical chemistry.

[22]  V. Wysocki,et al.  Selective gas-phase cleavage at the peptide bond C-terminal to aspartic acid in fixed-charge derivatives of Asp-containing peptides. , 2000, Analytical chemistry.

[23]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[24]  Terence P. Speed,et al.  NORMALIZATION , BASELINE CORRECTION AND ALIGNMENT OF HIGH-THROUGHPUT MASS SPECTROMETRY DATA , 2004 .

[25]  Zhongqi Zhang,et al.  Prediction of low-energy collision-induced dissociation spectra of peptides with three or more charges. , 2005, Analytical chemistry.

[26]  Frank Kjeldsen,et al.  Backbone carbonyl group basicities are related to gas-phase fragmentation of peptides and protein folding. , 2007, Angewandte Chemie.

[27]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[28]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.