False Positives Reduction in Top-down Protein Informatics using Support Vector Machines

The scarce but consistent chance of getting falsepositive matches [1], [2] in protein database search [3] hasalways casted a shadow over the reliability of results. Thesituation can be helped by viewing the protein data froma descriptive and the probabilistic framework, together.Using the conventional approach as the first stage, top downprotein data is descriptively searched for proteins and theresults are scored and ranked, using a top down proteinsearch engine. We then suggest applying Support VectorMachine, (SVM) as a second stage probabilistic scoringsystem, to the first stage protein database search results so asto further enhance protein classification. For SVM scoring,features are extracted from the top down data and a featuretable is constructed. An SVM using Radial Basis Functionis trained with this feature table. Later classification isperformed on the test data using this SVM. The classificationcan then be viewed together with the previously calculatedsearch engine score and a reordering of top ranked proteinsmay be done.

[1]  F W McLafferty,et al.  Infrared multiphoton dissociation of large multiply charged ions for biomolecule sequencing. , 1994, Analytical chemistry.

[2]  S. A. McLuckey,et al.  Ion/ion chemistry of high-mass multiply charged ions. , 1998, Mass spectrometry reviews.

[3]  I. Guerrera,et al.  Application of Mass Spectrometry in Proteomics , 2005, Bioscience reports.

[4]  Fred W. McLafferty,et al.  Hydrogen Atom Loss in Electron-Capture Dissociation: A Fourier Transform-Ion Cyclotron Resonance Study with Single Isotopomeric Ubiquitin Ions , 2002 .

[5]  Jianqi Li,et al.  A new strategy to filter out false positive identifications of peptides in SEQUEST database search results , 2007, Proteomics.

[6]  J R Yates,et al.  Protein sequencing by tandem mass spectrometry. , 1986, Proceedings of the National Academy of Sciences of the United States of America.

[7]  F W McLafferty,et al.  Electron capture dissociation of gaseous multiply charged ions by Fourier-transform ion cyclotron resonance , 2001, Journal of the American Society for Mass Spectrometry.

[8]  F. McLafferty,et al.  Electron Capture Dissociation of Multiply Charged Protein Cations. A Nonergodic Process , 1998 .

[9]  Karl Mechtler,et al.  HPLC techniques for proteomics analysis--a short overview of latest developments. , 2006, Briefings in functional genomics & proteomics.

[10]  O. Sparkman Mass Spectrometry Desk Reference , 2006 .

[11]  T D Wood,et al.  Sequence tag identification of intact proteins by matching tanden mass spectral data against sequence data bases. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[12]  B. Cargile,et al.  Potential for false positive identifications from large databases through tandem mass spectrometry. , 2004, Journal of proteome research.

[13]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.