Feature Selection for Tandem Mass Spectrum Quality Assessment

In the literature, hundreds of features have been proposed to assess the quality of tandem mass spectra. However, some features may be nearly irrelevant, and thus the inclusion of these nearly irrelevant features may degenerate the performance of quality assessment. This paper introduces a two-stage support vector machine recursive feature elimination (SVM-RFE) method to select the most relevant features from those found in the literature. To verify the relevance of the selected features, the classifiers with the selected features are trained and their performances are evaluated. The out performances of classifiers with the selected features illustrate that the set of selected features is more relevant to the quality of spectra than any set of features used in the literature.

[1]  Fang-Xiang Wu,et al.  Quality Assessment of Peptide Tandem Mass Spectra , 2006, IMSCCS.

[2]  S. Bryant,et al.  Assessing data quality of peptide mass spectra obtained by quadrupole ion trap mass spectrometry. , 2005, Journal of proteome research.

[3]  Eunok Paek,et al.  Quality assessment of tandem mass spectra based on cumulative intensity normalization. , 2006, Journal of proteome research.

[4]  A. Nesvizhskii,et al.  Experimental protein mixture for validating tandem mass spectral analysis. , 2002, Omics : a journal of integrative biology.

[5]  P. Pevzner,et al.  InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. , 2005, Analytical chemistry.

[6]  Marshall W. Bern,et al.  Automatic Quality Assessment of Peptide Tandem Mass Spectra , 2004, ISMB/ECCB.

[7]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[8]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[9]  David L. Tabb,et al.  Protein Identification by SEQUEST , 2001 .

[10]  E. Kolker,et al.  Spectral quality assessment for high-throughput tandem mass spectrometry proteomics. , 2004, Omics : a journal of integrative biology.

[11]  Tero Aittokallio,et al.  Quality classification of tandem mass spectrometry data , 2006, Bioinform..

[12]  Vineet Bafna,et al.  InsPecT : Fast and accurate identification of post-translationally modified peptides from tandem mass spectra , 2005 .

[13]  R. Aebersold,et al.  Dynamic Spectrum Quality Assessment and Iterative Computational Analysis of Shotgun Proteomic Data , 2006, Molecular & Cellular Proteomics.

[14]  I. Eidhammer,et al.  Improving the reliability and throughput of mass spectrometry‐based proteomics by spectrum quality filtering , 2006, Proteomics.

[15]  Hugh M. Cartwright,et al.  msmsEval: tandem mass spectral quality assignment for high-throughput proteomics , 2007, BMC Bioinformatics.