Improving Relevance Measures Using Genetic Programming

Relevance is a central concept in many feature selection algorithms. Given a relevance measure, a feature selection algorithm searches for a subset of features that maximise the relevance between the subset and target concepts. This paper first shows how relevance measures that rely on the posterior estimation such as information theory measures may fail to quantify the actual utility of subsets of features in certain situations. The paper then proposes a solution based on Genetic Programming which can improve the usability of these measures. The paper is focused on classification problems with numeric features.