Improving Kurdish Web Mining through Tree Data Structure and Porter’s Stemmer Algorithms

Stemming is one of the main important preprocessing techniques that can be used to enhance the accuracy of text classification. The key purpose of using the stemming is combining the number of words that have same stem to decrease high dimensionality of feature space. Reducing feature space cause to decline time to construct a model and minimize the memory space. In this paper, a new stemming approach is explored for enhancing Kurdish text classification performance. Tree data structure and Porter’s stemmer algorithms are incorporated for building the proposed approach. The system is assessed through using Support Vector Machine (SVM) and Decision Tree (C4.5) to illustrate the performance of the suggested stemmer after and before applying it. Furthermore, the usefulness of using stop words are considered before and after implementing the suggested approach.

[1]  Fardin Akhlaghian,et al.  Stemming for Kurdish Information Retrieval , 2013, AIRS.

[2]  Tarik A. Rashid,et al.  Kurdish stemmer pre-processing steps for improving information retrieval , 2018, J. Inf. Sci..

[3]  Usman Qamar,et al.  A Bayesian classifiers based combination model for automatic text classification , 2016, 2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS).

[4]  T. Danisman,et al.  Feeler: Emotion Classification of Text Using Vector Space Model , 2008 .

[5]  Elsayed M. Saad,et al.  Toward an ARABIC Stop-Words List Generation , 2012 .

[6]  Gosse Bouma,et al.  Accurate Stemming of Dutch for Text Classification , 2001, CLIN.

[7]  Kyumars Sheykh Esmaili,et al.  Building a Test Collection for Sorani Kurdish , 2013, 2013 ACS International Conference on Computer Systems and Applications (AICCSA).

[8]  P. Karthik,et al.  Classification of text documents using association rule mining with critical relative support based pruning , 2016, 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[9]  R. Duwairi,et al.  Stemming Versus Light Stemming as Feature Selection Techniques for Arabic Text Categorization , 2007, 2007 Innovations in Information Technologies (IIT).

[10]  Saïd El Alaoui Ouatik,et al.  Impact of stemming on Arabic text summarization , 2016, 2016 4th IEEE International Colloquium on Information Science and Technology (CiSt).

[11]  Neeraj Sharma,et al.  Text classification using combined sparse representation classifiers and support vector machines , 2016, 2016 4th International Symposium on Computational and Business Intelligence (ISCBI).

[12]  N. Omar,et al.  Automatic Kurdish Sorani text categorization using N-gram based model , 2012, 2012 International Conference on Computer & Information Science (ICCIS).

[13]  Mahmoud Ahmed,et al.  Arabic text stemming: Comparative analysis , 2016, 2016 Conference of Basic Sciences and Engineering Studies (SGCAC).

[14]  K. Taghva,et al.  Arabic Stemmer for Search Engines Information Retrieval , 2016 .

[15]  Abdellah Madani,et al.  New stemming for arabic text classification using feature selection and decision trees , 2014 .

[16]  Tarik A. Rashid,et al.  Automatic Kurdish Text Classification Using KDC 4007 Dataset , 2017, EIDWT.

[17]  Tarik A. Rashid,et al.  An evaluation of Reber stemmer with longest match stemmer technique in Kurdish Sorani text classification , 2018 .