Rough Set-Based SVM Classifier for Text Categorization

Efficiency of feature selection affects the whole classifier performance in text categorization. Integrating the distinct aspects of indiscernibility capability of rough set theory and good generalization ability of support vector machine, this paper proposes a new classification method named Rough Support Vector Machine. Rough set was employed as an attribute reduction tool to work on the original attribute set in order to carry out a redundancy removing. The reduct attribute result, which is composed of a low dimensional set of attributes with minimal loss of information, is then combined together according to our new attribute creation method and used as input to a support vector classifier with good generalization performance. The efficiency of the approach is demonstrated experimentally on classifying the text classification benchmark dataset and compared with the traditional rough set method and other machine learning methods.