Knowledge and Information Systems

With the proliferation of extremely high-dimensional data, feature selection algorithms have become indispensable components of the learning process. Strangely, despite extensive work on the stability of learning algorithms, the stability of feature selection algorithms has been relatively neglected. This study is an attempt to fill that gap by quantifying the sensitivity of feature selection algorithms to variations in the training set. We assess the stability of feature selection algorithms based on the stability of the feature preferences that they express in the form of weights-scores, ranks, or a selected feature subset. We examine a number of measures to quantify the stability of feature preferences and propose an empirical way to estimate them. We perform a series of experiments with several feature selection algorithms on a set of proteomics datasets. The experiments allow us to explore the merits of each stability measure and create stability profiles of the feature selection algorithms. Finally, we show how stability profiles can support the choice of a feature selection algorithm.

[1]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[3]  Peter D. Turney Technical Note: Bias and the Quantification of Stability , 2004, Machine Learning.

[4]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[5]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[6]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[7]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[8]  Pedro M. Domingos A Unifeid Bias-Variance Decomposition and its Applications , 2000, ICML.

[9]  Melanie Hilario,et al.  Mining mass spectra for diagnosis and biomarker discovery of cerebral accidents , 2004, Proteomics.

[10]  Peter D. Turney Technical note: Bias and the quantification of stability , 1995, Machine Learning.

[11]  Keki B. Irani,et al.  Multi-interval discretization of continuos attributes as pre-processing for classi cation learning , 1993, IJCAI 1993.

[12]  WestonJason,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002 .

[13]  E. Petricoin,et al.  Serum proteomic patterns for detection of prostate cancer. , 2002, Journal of the National Cancer Institute.

[14]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[15]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[16]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[17]  Pedro M. Domingos A Unifeid Bias-Variance Decomposition and its Applications , 2000, ICML.

[18]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[19]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[20]  Terri K. Attwood,et al.  METIS: multiple extraction techniques for informative sentences , 2005, Bioinform..

[21]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[22]  Pedro M. Domingos A Unified Bias-Variance Decomposition for Zero-One and Squared Loss , 2000, AAAI/IAAI.