Dependency-based feature selection for clustering symbolic data

Feature selection is a central problem in data analysis that have received a significant amount of attention from several disciplines, such as machine learning or pattern recognition. However, most of the research has been addressed towards supervised tasks, paying little attention to unsupervised learning. In this paper, we introduce an unsupervised feature selection method for symbolic clustering tasks. Our method is based upon the assumption that, in the absence of class labels, we can deem as irrelevant those features that exhibit low dependencies with the rest of features. Experiments with several data sets demonstrate that the proposed approach is able to detect completely irrelevant features and that, additionally, it removes other features without significantly hurting the performance of the clustering algorithm.