A Machine Learning Approach to Predicting Peptide Fragmentation Spectra

Accurate peptide identification from tandem mass spectrometry experiments is the cornerstone of proteomics. Although various approaches for matching database sequences with experimental spectra have been developed to date (e.g. Sequest, Mascot) the sensitivity and specificity of peptide identification have not yet reached their full potential. This is in part due to the tradeoffs between robustness and accuracy of the existing methods with respect to the non-uniform nature of peptide fragmentation and bond cleavages induced by different mass spectrometers. Accordingly, it is expected that new approaches to de novo predicting peptide fragmentation spectra will enable more accurate peptide identification. To address this problem, here we used a data-driven approach to learn peptide fragmentation rules in mass spectrometry, in the form of posterior probabilities, for various fragment-ion types of doubly and triply charged precursor ions. We show that the accuracy of our neural-network based methodology is useful for subsequent peptide database searches and that the most useful rules of fragmentation significantly differ across ion and precursor types.