A Comparative Study of combined Feature Selection Methods for Arabic Text Classification

Text classification is a very important task due to the huge amount of electronic documents. One of the problems of text classification is the high dimensi onality of feature space. Researchers proposed many algorithms to select related features from text. Th ese algorithms have been studied extensively for En glish text, while studies for Arabic are still limited. This st udy introduces an investigation on the performance of five widely used feature selection methods namely Chi-square, Correlation, GSS Coefficient, Information Gai n and Relief F. In addition, this study also introduces a n approach of combination of feature selection meth ods based on the average weight of the features. The experime nts are conducted using Naive Bayes and Support Vector Machine classifiers to classify a published Arabic corpus. The results show that the best results were obtained when using Information Gain method. The results also show that the combination of multiple feature sel ection methods outperforms the best results obtain by the individual methods.

[1]  Shadi Aljawarneh,et al.  An Efficient Feature Selection Method for Arabic Text Classification , 2013 .

[2]  Gaston L'Huillier,et al.  SVM-Based Feature Selection and Classification for Email Filtering , 2013 .

[3]  N. Omar,et al.  A Hybrid method using Lexicon-based Approach and Naive Bayes Classifier for Arabic Opinion Question Answering , 2014, J. Comput. Sci..

[4]  Houkuan Huang,et al.  Feature selection for text classification with Naïve Bayes , 2009, Expert Syst. Appl..

[5]  Gulden Uchyigit,et al.  Personalization Techniques and Recommender Systems , 2008, Personalization Techniques and Recommender Systems.

[6]  Bassam Al-Salemi,et al.  Statistical Bayesian Learning for Automatic Arabic Text Categorization , 2011 .

[7]  Sri Harsha Vege Ensemble of Feature Selection Techniques for High Dimensional Data , 2012 .

[8]  Zheng Optimization of a Computer-Aided Detection Scheme Using a Logistic Regression Model and Information Gain Feature Selection Method , 2013 .

[9]  Juan E. Gilbert,et al.  A class-specific ensemble feature selection approach for classification problems , 2010, ACM SE '10.

[10]  Mi Zhang,et al.  A feature selection-based framework for human activity recognition using wearable multimodal sensors , 2011, BODYNETS.

[11]  Fuji Ren,et al.  Class-indexing-based term weighting for automatic text classification , 2013, Inf. Sci..

[12]  David W. Corne,et al.  Feature subset selection for Arabic document categorization using BPSO-KNN , 2011, 2011 Third World Congress on Nature and Biologically Inspired Computing.

[13]  Gad Saad Evolutionary Psychology in the Business Sciences , 2011 .

[14]  Verónica Bolón-Canedo,et al.  A review of feature selection methods on synthetic data , 2013, Knowledge and Information Systems.

[15]  Shubhamoy Dey,et al.  A comparative study of feature selection and machine learning techniques for sentiment analysis , 2012, RACS.

[16]  R. Suganya,et al.  Content Based Image Retrieval of Ultrasound Liver Diseases Based on Hybrid Approach , 2012 .

[17]  Nazlia Omar,et al.  Ensemble of Classification Algorithms for Subjectivity and Sentiment Analysis of Arabic Customers ' Reviews , 2013 .

[18]  Ibrahim Abdulrab Ahmed Myocardial-Infraction Based on Intelligent Techniques , 2010 .

[19]  R. Duwairi,et al.  Stemming Versus Light Stemming as Feature Selection Techniques for Arabic Text Categorization , 2007, 2007 Innovations in Information Technologies (IIT).

[20]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[21]  AbdulMalik S. Al-Salman,et al.  A Comparative Study of Statistical Feature Reduction Methods for Arabic Text Categorization , 2010, NDT.

[22]  Taghi M. Khoshgoftaar,et al.  A Comparative Study of Ensemble Feature Selection Techniques for Software Defect Prediction , 2010, 2010 Ninth International Conference on Machine Learning and Applications.