An input variable importance definition based on empirical data probability and its use in variable selection

Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available. We propose a new method to score subsets of variables according to their usefulness for the performance of a given model. This method is applicable on every kind of model and on classification or regression task. We assess the efficiency of the method with our results on the NIPS 2003 feature selection challenge and with an example of a real application.