P-Stemmer or NLTK Stemmer for Arabic Text Classification?

Natural Language Processing (NLP) is a branch of computer science that focuses on developing systems that allow computers to communicate with people using everyday language. NLP tools are Devoted to making computers understand statements written in human language. Indexing, text retrieval and word processing are considered as challenges in the classification process. Hence, Arabic Natural Language Processing ANLP tools are needed to achieve the aforementioned tasks. ANLP includes preprocessing such as Stemming, Normalization, Stop-word Removal, Part of speech POS and other processes. In this work, we collected 1,000 news articles from Alghad.com newspaper, then we classified our dataset using SVM and NB algorithms using NLTK tool. We compared the results of two stemmers; P-Stemmer and NLTK stemmer using the mentioned classification process. The results of the classification for the P-Stemmer was better than the NLTK stemmer and for the two classifiers.

[1]  Mahmoud Al-Ayyoub,et al.  Deep learning for Arabic NLP: A survey , 2017, J. Comput. Sci..

[2]  Tarek Kanan Extracting Named Entities Using Named Entity Recognizer for Arabic News Articles , 2016 .

[3]  Edward A. Fox,et al.  Automated arabic text classification with P‐Stemmer, machine learning, and a tailored news article taxonomy , 2016, J. Assoc. Inf. Sci. Technol..

[4]  Yaser Jararweh,et al.  An efficient employment of internet of multimedia things in smart and future agriculture , 2019, Multimedia Tools and Applications.

[5]  A. Al-Fuqaha,et al.  A genetic approach for trajectory planning in non-autonomous Mobile Ad-Hoc Networks with QoS requirements , 2010, 2010 IEEE Globecom Workshops.

[6]  R. Duwairi,et al.  Stemming Versus Light Stemming as Feature Selection Techniques for Arabic Text Categorization , 2007, 2007 Innovations in Information Technologies (IIT).

[7]  Jafar J. Abukhait An Automated Surface Defect Inspection System Using Local Binary Patterns and Co-Occurrence Matrix Textures based on SVM Classifier , 2018 .

[8]  Abdelrahman Osman Elfaki,et al.  A Comparative Survey on Arabic Stemming: Approaches and Challenges , 2017 .

[9]  Khaled Shaalan,et al.  Rule-based Approach in Arabic Natural Language Processing , 2010 .

[10]  Edward A. Fox,et al.  Digital Library Educational Module Development Strategies and Sustainable Enhancement by the Community , 2010, ECDL.

[11]  Raed M. Salih,et al.  Simulation of emergency response operations for a static chemical spill within a building using an opportunistic resource utilization network , 2013, 2013 IEEE International Conference on Technologies for Homeland Security (HST).

[12]  Mona T. Diab,et al.  Second Generation AMIRA Tools for Arabic Processing : Fast and Robust Tokenization , POS tagging , and Base Phrase Chunking , 2009 .

[13]  Edward A. Fox,et al.  Big Data Text Summarization for Events: A Problem Based Learning Course , 2015, JCDL.

[14]  Hamdy M. Mousa,et al.  Improving Arabic Text Categorization using Normalization and Stemming Techniques , 2016 .

[15]  Mohamed Boudchiche,et al.  AlKhalil Morpho Sys 2: A robust Arabic morpho-syntactic analyzer , 2017, J. King Saud Univ. Comput. Inf. Sci..

[16]  Maysam Abbod,et al.  Enhanced Hidden Markov Models for accelerating medical volumes segmentation , 2011, 2011 IEEE GCC Conference and Exhibition (GCC).

[17]  Ala I. Al-Fuqaha,et al.  Design of a Social Collaboration and Precise Localization Services for the Blind and Visually Impaired , 2013, EUSPN/ICTH.

[18]  Yaser Jararweh,et al.  Parallel implementation for 3D medical volume fuzzy segmentation , 2020, Pattern Recognit. Lett..

[19]  Abed Allah Khamaiseh,et al.  A comprehensive survey of arabic sentiment analysis , 2019, Inf. Process. Manag..

[20]  Mahmoud Al-Ayyoub,et al.  Enhancing the determination of aspect categories and their polarities in Arabic reviews using lexicon-based approaches , 2015, 2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT).

[21]  Mahmoud Al-Ayyoub,et al.  Enhanced 3D segmentation techniques for reconstructed 3D medical volumes: Robust and Accurate Intelligent System , 2017, EUSPN/ICTH.

[22]  Ibrahim Obeidat,et al.  A Secure Encrypted Protocol for Clients' Handshaking in the Same Network , 2019, Int. J. Interact. Mob. Technol..

[23]  Junaid Qadir,et al.  Using phase shift fingerprints and inertial measurements in support of precise localization in urban areas , 2019, Personal and Ubiquitous Computing.

[24]  Tarek Kanan,et al.  Multi-orientation geometric medical volumes segmentation using 3D multiresolution analysis , 2018, Multimedia Tools and Applications.

[25]  Ala I. Al-Fuqaha,et al.  A survey on particle swarm optimization with emphasis on engineering and network applications , 2019, Evolutionary Intelligence.

[26]  Farshad Fotouhi,et al.  An efficient cold start solution based on group interests for recommender systems , 2018, DATA.

[27]  Tarek Kanan,et al.  An Indoor Localization Approach Based on Deep Learning for Indoor Location-Based Services , 2019, 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT).

[28]  Eyad K Almaita,et al.  Improving Stability and Convergence for Adaptive Radial Basis Function Neural Networks Algorithm. (On-Line Harmonics Estimation Application) , 2017 .

[29]  Shadi AlZu'bi,et al.  3D multiresolution statistical approaches for accelerated medical image and volume segmentation , 2011 .

[30]  Bilal Hawashin,et al.  An Efficient Agent-Based System to Extract Interests of User Groups , 2016 .

[31]  Bilal Hawashin,et al.  An Efficient User Interest Extractor for Recommender Systems , 2022 .

[32]  Ayman Mansour,et al.  Classification based on Gaussian-kernel Support Vector Machine with Adaptive Fuzzy Inference System , 2018 .

[33]  Nizar Habash,et al.  MADAMIRA: A Fast, Comprehensive Tool for Morphological Analysis and Disambiguation of Arabic , 2014, LREC.

[34]  Mohsen Guizani,et al.  A New Hierarchical and Adaptive Protocol for Minimum-Delay V2V Communication , 2009, GLOBECOM 2009 - 2009 IEEE Global Telecommunications Conference.

[35]  Mohammad Hijjawi,et al.  ARABIC LANGUAGE CHALLENGES IN TEXT BASED CONVERSATIONAL AGENTS COMPARED TO THE ENGLISH LANGUAGE , 2015 .