Design and Analysis of the NIPS2003 Challenge

We organized in 2003 a benchmark of feature selection methods, whose results are summarized and analyzed in this chapter. The top ranking entrants of the competition describe their methods and results in more detail in the following chapters. We provided participants with five datasets from different application domains and called for classification results using a minimal number of features. Participants were asked to make on-line submissions on two test sets: a validation set and a “final” test set, with performance on the validation set being presented immedi to the participant and performance on the final test set presented at the end of the competition. The competition took place over a period of 13 weeks and attracted 78 research groups. In total 1863 entries were made on the validation sets during the development period and 135 entries on all test sets for the final competition. The winners used a combination of Bayesian neural networks with ARD priors and Dirichlet diffusion trees. Other top entries used a variety of methods for feature selection, which combined filters and/or wrapper or embedded methods using Random Forests, kernel methods, neural networks as classification engine. The classification engines most often used after feature selection are regularized kernel methods, including SVMs. The results of the benchmark (including the predictions made by the participants and the features they selected) and the scoring software are publicly available. The benchmark is available at http://www.nipsfsc.ecs.soton.ac.uk/ for post-challenge submissions to stimulate further research.

[1]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[2]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[3]  Ron Kohavi,et al.  Wrappers for feature selection , 1997 .

[4]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[5]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[6]  Isabelle Guyon,et al.  What Size Test Set Gives Good Error Rate Estimates? , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[8]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[9]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[10]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[11]  P. Schellhammer,et al.  Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. , 2002, Cancer research.

[12]  James Theiler,et al.  Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space , 2003, J. Mach. Learn. Res..

[13]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[14]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[15]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[16]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.