RAId_deNovo: using de novo based spectrum-specific statistics to combine search results from multiple scoring functions and more

Comparing or combining results of peptide identification from different search methods with firm foundation is impeded by the lack of a universal statistical standard. Providing an E-value calibration protocol, we demonstrated earlier the possiblity to translate either the score or heuristic E-value reported by any method into the textbook-defined E-value, which may serve as the universal statistical standard. This protocol, although robust, may lose spectrum-specific statistics and might require a new calibration when changes in experimental setup occur. RAId deNovo circumvents these issues. We show, for a class of scoring functions, how RAId deNovo uses the respective score histograms from scoring all possible de novo peptides to assign accurate, spectrum-specific E-values, thereby creating a calibration-free protocol for accurate significance assignment and for combining search results. RAId deNovo features four different modes: (i) compute the total number of possible peptides for a given molecular mass range, (ii) generate the score histogram given a MS/MS spectrum and a scoring function, (iii) reassign E-values for a list of candidate peptides given a MS/MS spectrum and the scoring functions chosen, and (iv) perform database searches using user-selected scoring functions. In modes (iii) and (iv), RAId deNovo is capable of combining results from different scoring functions using spectrum-specific statistics. The web link is http://www.ncbi.nlm.nih.gov/CBBresearch/qmbp/raid denovo/index.html. Relevant binaries for Linux, Windows, and Mac OS X are available from the same page.

[1]  Yi-Kuo Yu,et al.  Calibrating E-values for MS2 database search methods , 2007, Biology Direct.

[2]  Yi-Kuo Yu,et al.  RAId_DbS: Peptide Identification using Database Searches with Realistic Statistics , 2007, Biology Direct.

[3]  R. Beavis,et al.  A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. , 2003, Analytical chemistry.

[4]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[5]  William Stafford Noble,et al.  Rapid and accurate peptide identification from tandem mass spectra. , 2008, Journal of proteome research.

[6]  A. B. Robinson,et al.  Distribution of glutamine and asparagine residues and their near neighbors in peptides and proteins. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Yi-Kuo Yu,et al.  Ranked solutions to a class of combinatorial optimizations - with applications in mass spectrometry based peptide sequencing , 2005 .

[8]  Benno Schwikowski,et al.  Assessing Bias in Experiment Design for Large Scale Mass Spectrometry-based Quantitative Proteomics*S , 2007, Molecular & Cellular Proteomics.

[9]  R. Aebersold,et al.  A uniform proteomics MS/MS analysis platform utilizing open XML file formats , 2005, Molecular systems biology.

[10]  Yi-Kuo Yu,et al.  Statistical Characterization of a 1D Random Potential Problem - with applications in score statistics of MS-based peptide sequencing. , 2008, Physica A.

[11]  Brendan MacLean,et al.  General framework for developing and evaluating database scoring algorithms using the TANDEM search engine , 2006, Bioinform..

[12]  A. Nesvizhskii,et al.  Experimental protein mixture for validating tandem mass spectral analysis. , 2002, Omics : a journal of integrative biology.

[13]  B. Searle,et al.  Improving sensitivity by probabilistically combining results from multiple MS/MS search methodologies. , 2008, Journal of proteome research.

[14]  R. Agarwala,et al.  Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches , 2006, Nucleic acids research.

[15]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[16]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[17]  William Stafford Noble,et al.  Statistical calibration of the SEQUEST XCorr function. , 2009, Journal of proteome research.

[18]  Yi-Kuo Yu,et al.  RAId_DbS: mass-spectrometry based peptide identification web server with knowledge integration , 2008, BMC Genomics.

[19]  P. Pevzner,et al.  Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. , 2008, Journal of proteome research.

[20]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[21]  Yi-Kuo Yu,et al.  Robust accurate identification of peptides (RAId): deciphering MS2 data using a structured library search with de novo based statistics , 2005, Bioinform..

[22]  Olga Vitek,et al.  Statistical design of quantitative mass spectrometry-based proteomic experiments. , 2009, Journal of proteome research.

[23]  M. MacCoss,et al.  A fast SEQUEST cross correlation algorithm. , 2008, Journal of proteome research.

[24]  William H. Press,et al.  Numerical recipes in C , 2002 .