MS2DB: A Mass-Based Hashing Algorithm for the Identification of Disulfide Linkage Patterns in Protein Utilizing Mass Spectrometric Data

The tertiary structure and biological function of a protein can be better understood given knowledge of the number and location of its disulfide bonds. By utilizing mass spectrometric (MS) experimental procedures that produce spectra of the protein's peptides joined by a disulfide bond, we can make initial identifications of these bonded cysteine pairings. The algorithmic problem then becomes how to match a theoretical mass space of all possible bonded peptides against the MS data. Our solution, MSHashID, utilizes the expected amino acid mass in combination with a hash structure to improve the time complexity of making an identification from worse than O(n2) to approximately O(n), where n is the size of the mass space. We have developed a software package, MS2DB, which includes an implementation of this algorithm. Experiments using published data show that the MSHashID algorithm efficiently makes the correct initial identifications, which can then be confirmed using tandem mass spectrometry (MS/MS).