Modeling and Characterization of Multi-charge Mass Spectra for Peptide Sequencing

Sequencing of peptide sequences using tandem mass spectrometry data is an important and challenging problem in proteomics. In this paper, we address the problem of peptide sequencing for multi-charge spectra. Most peptide sequencing algorithms currently handle spectra of charge 1 or 2 and have not been designed to handle higher-charge spectra. We give a characterization of multicharge spectra by generalizing existing models. Using these new models, we have analyzed spectra with charges 1-5 from the GPM [8] datasets. Our analysis shows that higher charge peaks are present and they contribute significantly to prediction of the complete peptide. They also help to explain why existing algorithms do not perform well on multi-charge spectra. We also propose a new de novo algorithm for dealing with multi-charge spectra based on the new models. Experimental results show that it performs well on all spectra, especially so for multi-charge spectra.