A Hybrid Feature Selection Technique for Classification of Group-based Holy Quran Verses

Text classification problem is primarily applied in document labeling. However, the major setbacks with the existing feature selection techniques are high computational runtime associated with wrapper-based FS techniques and low classification accuracy performance associated with filter-based FS techniques. In this paper, a hybrid feature selection technique is proposed. The proposed hybrid technique is a combination of filter-based information gain (IG) and wrapper-based CFS algorithms. The specific purpose for this combination is to achieve both high classification accuracy performance (associated with wrapper) at lower computational runtime (associated with filter). The proposed IG-CFS technique is then applied to label Quranic verses of al-Baqara and al-Anaam from two major references, the English translation and commentary (tafsir). StringToWordVector with weighted TF-IDF method were used for preprocessing the textual data while four classifiers: naïve bayes, libSVM, k-NN, and decision trees (J48) were experimented. The overall highest classification accuracy of 94.5% was achieved at 3.89secs runtime with the proposed IG-CFS technique.

[1]  Eric Atwell,et al.  Arabic Quranic Search Tool Based on Ontology , 2016, NLDB.

[2]  Alper Kursat Uysal,et al.  An improved global feature selection scheme for text classification , 2016, Expert Syst. Appl..

[3]  Es Atwell,et al.  Provisions of Quran Tajweed Ontology (Articulations Points of the Letters, UN Vowel Noon and Tanween) , 2017 .

[4]  Siti Khaotijah Mohammad,et al.  Categorization of 'Holy Quran-Tafseer' using K-Nearest Neighbor Algorithm , 2015 .

[5]  Samy S. Abu Naser,et al.  Teaching the Right Letter Pronunciation in Reciting the Holy Quran Using Intelligent Tutoring System , 2017 .

[6]  Mostafa Mahmoud,et al.  Using Ontology for Associating Web Multimedia Resources with the Holy Quran , 2013, 2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences.

[7]  Ku Ruhana Ku-Mahamud,et al.  A subject identification method based on term frequency technique , 2017 .

[8]  Huan Liu,et al.  Feature selection for classification: A review , 2014 .

[9]  Nizamettin Aydin,et al.  Binary black hole algorithm for feature selection and classification on biological data , 2017, Appl. Soft Comput..

[10]  Jenq Haur Wang,et al.  Incremental Neural Network Construction for Text Classification , 2014, 2014 International Symposium on Computer, Consumer and Control.

[11]  Ayat Hafzalla,et al.  Verification System for Quran Recitation Recordings , 2017 .

[12]  Mengjie Zhang,et al.  Differential evolution for filter feature selection based on information theory and feature ranking , 2018, Knowl. Based Syst..

[13]  Mouloud Koudil,et al.  Using Active Learning in Text Classification of Quranic Sciences , 2013, 2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences.

[14]  Aida Mustapha,et al.  A Group-Based Feature Selection Approach to Improve Classification of Holy Quran Verses , 2018, SCDM.

[15]  Phillip H. Jones,et al.  k-NN text classification using an FPGA-based sparse matrix vector multiplication accelerator , 2015, 2015 IEEE International Conference on Electro/Information Technology (EIT).

[16]  Wei Chen,et al.  Naïve Bayes Classifier with Feature Selection to Identify Phage Virion Proteins , 2013, Comput. Math. Methods Medicine.

[17]  Mohammed Dahab,et al.  Stemmer Impact on Quranic Mobile Information Retrieval Performance , 2016 .

[18]  Eric Atwell,et al.  Computational ontologies for semantic tagging of the Quran:A survey of past approaches , 2014 .

[19]  Huan Liu,et al.  Feature Selection for Classification: A Review , 2014, Data Classification: Algorithms and Applications.

[20]  Mohammed Aladeemy,et al.  A new hybrid approach for feature selection and support vector machine model selection based on self-adaptive cohort intelligence , 2017, Expert Syst. Appl..

[21]  Amrit Suman,et al.  Feature Selection by Genetic Algorithm and SVM Classification for Cancer Detection , 2014 .

[22]  Nagwa M. El-Makky,et al.  Al-Bayan: An Arabic Question Answering System for the Holy Quran , 2014, ANLP@EMNLP.

[23]  Aida Mustapha,et al.  Comparative Analysis of Text Classification Algorithms for Automated Labelling of Quranic Verses. , 2017 .

[24]  Jasmina Novakovic,et al.  Using Information Gain Attribute Evaluation to Classify Sonar Targets , 2009 .

[25]  Sumit Das,et al.  Applications of Artificial Intelligence in Machine Learning: Review and Prospect , 2015 .

[26]  Yogesh Kumar,et al.  Machine Learning: An artificial intelligence methodology , 2013 .

[27]  Mohd Juzaiddin Ab Aziz,et al.  A Question Answering System on Holy Quran Translation Based on Question Expansion Technique and Neural Network Classification , 2016, J. Comput. Sci..

[28]  Azuraliza Abu Bakar,et al.  Hybrid feature selection based on enhanced genetic algorithm for text categorization , 2016, Expert Syst. Appl..

[29]  Ivan Bratko,et al.  Machine learning in artificial intelligence , 1993, Artif. Intell. Eng..

[30]  Ali Selamat,et al.  Support vector machine based approach for quranic words detection in online textual content , 2014, 2014 8th. Malaysian Software Engineering Conference (MySEC).

[31]  Shufen Liu,et al.  An Effective Feature Selection Approach Using the Hybrid Filter Wrapper , 2016 .