Feature selection method using improved CHI Square on Arabic text classifiers: analysis and application

Text classification could be defined as the way of allocating text into predefined groups according to its contents. Over the past few years, an increase emerged in the volume of information in the varied fields on the Internet, thus making the classification of texts one of the most important, yet challenging. Text classification is commonly employed in numerous applications and for different objectives. The extensive and broad use of the Internet, particularly in the Arab world, as well as the massive number of the documents and pages which are provided in the Arabic language, raised the need for having suitable tools for classification of these pages and documents by their main categories. The aim of this paper to study the effect of the improved CHI (ImpCHI) Square on the performance of six well-known classifiers: Random Forest, Decision Tree, Naive Bayes, Naive Bayes Multinomial, Bayes Net, and Artificial Neural Networks. These proposed techniques are quite important for improving classification of Arabic documents and can be regarded as a promising basis for the stage of text classification because it contributes to the classification of the texts into predefined categories. This combination method takes the advantages of more than one technique, which can produce better results in the final outcomes. The dataset employed in this paper includes 9055 Arabic documents that were collected from various Arabic resources. Based on their content, these documents were divided into twelve categories. Four performance evaluation criteria were used: the F-measure, recall, precision, and Time build model. The experimental results show that the use of ImpCHI square gives better classification results than the normal CHI square method with all studied classifiers, in terms of all used performance criteria.

[1]  Ekta Jadon,et al.  Data Mining: Document Classification using Naive Bayes Classifier , 2017 .

[2]  Shadi Aljawarneh,et al.  An Efficient Feature Selection Method for Arabic Text Classification , 2013 .

[3]  Guanzheng Tan,et al.  The Effect of Preprocessing on Arabic Document Categorization , 2016, Algorithms.

[4]  Mohammed J. Bawaneh,et al.  Arabic Text Classification using K-NN and Naive Bayes , 2008 .

[5]  Abdelwadood Mesleh,et al.  Chi Square Feature Extraction Based Svms Arabic Language Text Categorization System , 2007 .

[6]  Laith Mohammad Abualigah,et al.  A new feature selection method to improve the document clustering using particle swarm optimization algorithm , 2017, J. Comput. Sci..

[7]  Ali Diabat,et al.  A Comprehensive Survey of the Harmony Search Algorithm in Clustering Applications , 2020, Applied Sciences.

[8]  Mohammed Azmi Al-Betar,et al.  Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering , 2017, Expert Syst. Appl..

[9]  Edward A. Fox,et al.  Automated arabic text classification with P‐Stemmer, machine learning, and a tailored news article taxonomy , 2016, J. Assoc. Inf. Sci. Technol..

[10]  Shuai Liu,et al.  Energy Spectrum CT Image Detection Based Dimensionality Reduction with Phase Congruency , 2018, Journal of Medical Systems.

[11]  Zainab Abu Bakar,et al.  A rule-based Arabic stemming algorithm , 2011 .

[12]  Miao Li,et al.  Learning to rank with relational graph and pointwise constraint for cross-modal retrieval , 2018, Soft Comput..

[13]  Alhareth Mohammed Abu Hussein,et al.  Sentiment Analysis in Healthcare: A Brief Review , 2019 .

[14]  Laith Mohammad Abualigah,et al.  Feature Selection and Enhanced Krill Herd Algorithm for Text Document Clustering , 2018, Studies in Computational Intelligence.

[15]  Mohammed Al-Sarem,et al.  Feature selection using an improved Chi-square for Arabic text classification , 2020, J. King Saud Univ. Comput. Inf. Sci..

[16]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[17]  Abdellah Madani,et al.  An improved Chi-sqaure feature selection for Arabic text classification using decision tree , 2016, 2016 11th International Conference on Intelligent Systems: Theories and Applications (SITA).

[18]  Jihoon Yang,et al.  Constructive Neural-Network Learning Algorithms for Pattern Classification , 2000 .

[19]  Riyad Al-Shalabi,et al.  Different Classification Algorithms Based on Arabic Text Classification: Feature Selection Comparative Study , 2015 .

[20]  Adel Hamdan Mohammad,et al.  Arabic Text Categorization Using Support vector machine, Naïve Bayes and Neural Network , 2016 .

[21]  Laith Mohammad Abualigah,et al.  Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering , 2017, The Journal of Supercomputing.

[22]  Laith Mohammad Abualigah,et al.  Hybrid clustering analysis using improved krill herd algorithm , 2018, Applied Intelligence.

[23]  Qingzhen Xu,et al.  Thermal comfort research on human CT data modeling , 2017, Multimedia Tools and Applications.

[24]  Laith Mohammad Abualigah,et al.  APPLYING GENETIC ALGORITHMS TO INFORMATION RETRIEVAL USING VECTOR SPACE MODEL , 2015 .

[25]  Divakar Singh,et al.  Neural Network Approach for Text Classification using Relevance Factor as Term Weighing Method , 2013 .

[26]  Dejun Zhang,et al.  A full migration BBO algorithm with enhanced population quality bounds for multimodal biomedical image registration , 2020, Appl. Soft Comput..

[27]  Mohammad Shehab,et al.  Text Summarization: A Brief Review , 2019, Studies in Computational Intelligence.

[28]  Miao Li,et al.  A new cluster computing technique for social media data analysis , 2019, Cluster Computing.

[29]  Qi Li,et al.  A novel edge-oriented framework for saliency detection enhancement , 2019, Image Vis. Comput..

[30]  Laith Mohammad Abualigah,et al.  A combination of objective functions and hybrid Krill herd algorithm for text document clustering analysis , 2018, Eng. Appl. Artif. Intell..

[31]  Mahmoud Al-Ayyoub,et al.  Automatic Arabic text categorization: A comprehensive comparative study , 2015, J. Inf. Sci..

[32]  K. Saravanan,et al.  REVIEW ON CLASSIFICATION BASED ON ARTIFICIAL NEURAL NETWORKS , 2014 .

[33]  Ahmed H. Aliwy,et al.  Tokenization as Preprocessing for Arabic Tagging System , 2012 .

[34]  Osisanwo F.Y,et al.  Supervised Machine Learning Algorithms: Classification and Comparison , 2017 .

[35]  Abdelwadood Moh'd. Mesleh,et al.  Feature sub-set selection metrics for Arabic text classification , 2011, Pattern Recognit. Lett..

[36]  Ajith Abraham,et al.  Selection scheme sensitivity for a hybrid Salp Swarm Algorithm: analysis and applications , 2020, Engineering with Computers.

[37]  Mohammed A. Otair COMPARATIVE ANALYSIS OF ARABIC STEMMING ALGORITHMS , 2013 .

[38]  Tarek F. Gharib,et al.  Arabic Text Classification Using Support Vector Machines , 2009, Int. J. Comput. Their Appl..

[39]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[40]  Ahmad M. Khasawneh,et al.  A parallel hybrid krill herd algorithm for feature selection , 2020, Int. J. Mach. Learn. Cybern..