Building manageable rough set classifiers

An interesting aspect of techniques for data mining and knowledge discovery is their potential for generating hypotheses by discovering underlying relationships buried in the data. However, the set of possible hypotheses is often very large and the extracted models may become prohibitively complex. It is therefore typically desirable to only consider the "strongest" hypotheses, so that smaller models can be obtained that also retain good classificatory capabilities. This paper outlines how rule-based classifiers based on rough set theory and Boolean reasoning that are both small and perform well can be developed. Applied to a real-world medical dataset, the final models are shown to exhibit good performance using only a subset of the available information. Furthermore, the number of resulting rules is low and enables practical a posteriori inspection and interpretation of the models.