Random Reducts: A Monte Carlo Rough Set-based Method for Feature Selection in Large Datasets

An important step prior to constructing a classifier for a very large data set is feature selection. With many problems it is possible to find a subset of attributes that have the same discriminative power as the full data set. There are many feature selection methods but in none of them are Rough Set models tied up with statistical argumentation. Moreover, known methods of feature selection usually discard shadowed features, i.e. those carrying the same or partially the same information as the selected features. In this study we present Random Reducts RR-a feature selection method which precedes classification per se. The method is based on the Monte Carlo Feature Selection MCFS layout and uses Rough Set Theory in the feature selection process. On synthetic data, we demonstrate that the method is able to select otherwise shadowed features of which the user should be made aware, and to find interactions in the data set.