DisCoSet: Discovery of Contrast Sets to Reduce Dimensionality and Improve Classification

AbstractTraditionally, contrast set mining aims at finding a set of rules that best distinguish the instances of different user-defined groups. Contrast sets are conjunctions of attribute-value pairs that are significantly more frequent in one group than in other groups. Typically, these contrast sets are extracted from categorical data or discretized numerical data. Existing methods of rule-based contrast sets require some user-defined thresholds to select the contrast sets. In this paper, we propose a greedy algorithm, called DisCoSet, to find incrementally a minimum set of local features that best distinguishes a class from other classes without resorting to discretization. The discovered contrast sets reduce the dimensionality of the feature vectors considerably and improve the classification accuracy significantly. We show that the proposed algorithm reduces the dimensionality of class instances by 40%-97% of the original length and yet improves classification accuracy by 10%-24% using different type...