The Effect of Combining Different Feature Selection Methods on Arabic Text Classification

Feature selection is one of several factors affecting text classification systems. Feature selection aims to choose a representative subset of all features to reduce the complexity of classification problems. Usually a single method is used for feature selection. For English, several attempts were reported examining the combination of different feature selection methods. To the best of our knowledge no such attempts were reported for Arabic text classification. In this study, we examined the effect of combining five feature selection methods, namely CHI, IG, GSS, NGL and RS, on Arabic text classification accuracy. Two approaches of combination were used, intersection (AND) and union (OR). The NB classification algorithm was used to classify a Saudi Press Agency dataset which comprised 6,300 texts divided evenly into six classes. Three feature representation schemas were used, namely Boolean, TFiDF and LTC. The experiments show slight improvement in classification accuracy for combining two and three feature selection methods. No improvement on classification accuracy was seen when four or all five feature selection methods were combined.