论文信息 - General framework for developing and evaluating database scoring algorithms using the TANDEM search engine

General framework for developing and evaluating database scoring algorithms using the TANDEM search engine

MOTIVATION Tandem mass spectrometry (MS/MS) identifies protein sequences using database search engines, at the core of which is a score that measures the similarity between peptide MS/MS spectra and a protein sequence database. The TANDEM application was developed as a freely available database search engine for the proteomics research community. To extend TANDEM as a platform for further research on developing improved database scoring methods, we modified the software to allow users to redefine the scoring function and replace the native TANDEM scoring function while leaving the remaining core application intact. Redefinition is performed at run time so multiple scoring functions are available to be selected and applied from a single search engine binary. We introduce the implementation of the pluggable scoring algorithm and also provide implementations of two TANDEM compatible scoring functions, one previously described scoring function compatible with PeptideProphet and one very simple scoring function that quantitative researchers may use to begin their development. This extension builds on the open-source TANDEM project and will facilitate research into and dissemination of novel algorithms for matching MS/MS spectra to peptide sequences. The pluggable scoring schema is also compatible with related search applications P3 and Hunter, which are part of the X! suite of database matching algorithms. The pluggable scores and the X! suite of applications are all written in C++. AVAILABILITY Source code for the scoring functions is available from http://proteomics.fhcrc.org

[1] S. Bryant,et al. Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[2] R. Aebersold,et al. A uniform proteomics MS/MS analysis platform utilizing open XML file formats , 2005, Molecular systems biology.

[3] R. Aebersold,et al. ProbID: A probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data , 2002, Proteomics.

[4] C. Ball,et al. Saccharomyces Genome Database. , 2002, Methods in enzymology.

[5] S. Guha,et al. Migration events play significant role in genetic differentiation: A microsatellite-based study on Sikkim settlers , 2005, Genome Biology.

[6] Alexey I Nesvizhskii,et al. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[7] Robertson Craig,et al. TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[8] Nichole L. King,et al. Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry , 2004, Genome Biology.

[9] Cathy H. Wu,et al. UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..