RISC: A New Filter Approach for Feature Selection from Proteomic Data

This paper proposes a novel feature selection technique for SELDITOF spectrum data. The new technique, called RISC (Relevance Index by Sample Counting), measures the relevance of features based on each sample's discriminating power to partition the samples in the opposite class. We also proposes a heuristic searching method to obtain the optimal feature set, which makes use of the relevance parameters. Our technique is fast even for extremely high-dimensional datasets such as SELDI spectrum, since it has low computational complexity and consists of simple counting operations. The new technique also shows good performance comparable to the conventional feature selection techniques from the experiment on three clinical datasets from NCI/CCR and FDA/CBER Clinical Proteomics Program Databank: Ovarian 4-3-02, Ovarian 7-8-02, Prostate.

[1]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[2]  Bruce Randall Donald,et al.  Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum , 2003, J. Comput. Biol..

[3]  Jeffrey S. Morris,et al.  Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments , 2004, Bioinform..

[4]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[5]  Joel D. Morrisett,et al.  Classification Analysis of Surface-enhanced Laser Desorption/Ionization Mass Spectral Serum Profiles for Prostate Cancer , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[6]  Elena Marchiori,et al.  Robust SVM-Based Biomarker Selection with Noisy Mass Spectrometric Proteomic Data , 2006, EvoWorkshops.

[7]  Nello Cristianini,et al.  Support vector machines , 2009 .

[8]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[9]  Elena Marchiori,et al.  Feature selection in proteomic pattern data with support vector machines , 2004, 2004 Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[10]  P. Schellhammer,et al.  Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. , 2002, Clinical chemistry.

[11]  Jill P. Mesirov,et al.  Computational Biology , 2018, Encyclopedia of Parallel Computing.

[12]  Tao Xiong,et al.  A combined SVM and LDA approach for classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[13]  Ilya Levner,et al.  Feature selection and nearest centroid classification for protein mass spectrometry , 2005, BMC Bioinformatics.

[14]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[15]  Bernhard Tilg,et al.  Feature Selection on High Throughput SELDI-TOF Mass-Spectrometry Data for Identifying Biomarker Candidates in Ovarian and Prostate Cancer , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[16]  Wlodzislaw Duch,et al.  Feature Selection for High-Dimensional Data: A Kolmogorov-Smirnov Correlation-Based Filter , 2005, CORES.

[17]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[18]  David Ward,et al.  Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data , 2003, Bioinform..

[19]  Robert Tibshirani,et al.  Sample classification from protein mass spectrometry, by 'peak probability contrasts' , 2004, Bioinform..