Impact of stemming on Arabic text summarization

Stemming is a process of reducing inflected words to their stem or root from a generally written word form. This process is used in many text mining application as a feature selection technique. Moreover, Arabic text summarization has increasingly become an important task in natural language processing area (NLP). Therefore, the aim of this paper is to evaluate the impact of three different Arabic stemmers (i.e. Khoja, Larekey and Alkhalil's stemmer) on the text summarization performance for Arabic language. The evaluation of the proposed system, with the three different stemmers and without stemming, on the dataset used shows that the best performance was achieved by Khoja stemmer in term of recall, precision and F1-measure. The evaluation also shows that the performances of the proposed system are significantly improved by applying the stemming process in the pre-processing stage.

[1]  Ophir Frieder,et al.  On arabic search: improving the retrieval effectiveness via a light stemming approach , 2002, CIKM '02.

[2]  Leah S. Larkey,et al.  Arabic Information Retrieval at UMass in TREC-10 , 2001, TREC.

[3]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[4]  Wesam M. Ashour,et al.  Stemming Effectiveness in Clustering of Arabic Documents , 2012 .

[5]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[6]  Fredric C. Gey,et al.  Building an Arabic Stemmer for Information Retrieval , 2002, TREC.

[7]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[8]  Lisa Ballesteros,et al.  Light Stemming for Arabic Information Retrieval , 2007 .

[9]  Nazlia Omar,et al.  Arabic machine translation: a survey , 2012, Artificial Intelligence Review.

[10]  Masnizah Mohd,et al.  Impact of Stemmer on Arabic Text Retrieval , 2014, AIRS.

[11]  Fernando Llopis,et al.  Passage Selection to Improve Question Answering , 2002, COLING 2002.

[12]  S. A. Ouatik,et al.  Stemming and similarity measures for Arabic Documents Clustering , 2010, 2010 5th International Symposium On I/V Communications and Mobile Network.

[13]  Khai Nguyen,et al.  TSGVi: a graph-based summarization system for Vietnamese documents , 2012, J. Ambient Intell. Humaniz. Comput..

[14]  Masnizah Mohd,et al.  Distance Measures and Stemming Impact on ‎Arabic Document Clustering , 2014, AIRS.

[15]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[16]  V. Tunali,et al.  Examining the impact of stemming on clustering Turkish texts , 2012, 2012 International Symposium on Innovations in Intelligent Systems and Applications.

[17]  Khaled Shaalan,et al.  Arabic Natural Language Processing: Challenges and Solutions , 2009, TALIP.

[18]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[19]  A. Lachkar,et al.  Stemming for Arabic words similarity measures based on Latent Semantic Analysis model , 2012, 2012 International Conference on Multimedia Computing and Systems.

[20]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[21]  Abdelmonaime Lachkar,et al.  Stemming versus Light Stemming for measuring the simitilarity between Arabic Words with Latent Semantic Analysis model , 2012, 2012 Colloquium in Information Science and Technology.

[22]  Chinatsu Aone,et al.  Fast and effective text mining using linear-time document clustering , 1999, KDD '99.

[23]  Lisa Ballesteros,et al.  Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis , 2002, SIGIR '02.

[24]  Fawaz S. Al-Anzi,et al.  Stemming impact on Arabic text categorization performance: A survey , 2015, 2015 5th International Conference on Information & Communication Technology and Accessibility (ICTA).