Semantic Information Retrieval: A Comparative Experimental Study of NLP Tools and Language Resources for Arabic

In this paper, we try to exploit the semantic richness of Arabic language for Information Retrieval (IR). The semantics of Arabic words may be extracted from dictionaries or corpora, which are analyzed and minded using Natural Language Processing (NLP) and text mining tools. This allows modeling the contextual dependencies between words, which help identify the meaning of queries in the search process. Thus, the queries are enriched by semantic knowledge, which enhances search performance. In this context, this paper describes a text mining-based approach for Arabic semantic IR, which considers senses of query terms. Experiments and results based on a standard Arabic Test collection are discussed through this communication. In the one hand, we compare dictionary versus corpus-based approaches for modeling semantics. On the other hand, we compare some Arabic NLP tools in the preprocessing step. Thus, we study the effect of Arabic morphology on the semantic interpretation of queries.

[1]  Douglas W. Oard,et al.  Term selection for searching printed Arabic , 2002, SIGIR '02.

[2]  Anders Holst,et al.  Random indexing of text samples for latent semantic analysis , 2000 .

[3]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[4]  Nayer M. Wanas,et al.  1 A Comparative Study of Rocchio Classifier Applied to supervised WSD Using Arabic Lexical Samples , 2015 .

[5]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[6]  Ophir Frieder,et al.  On arabic search: improving the retrieval effectiveness via a light stemming approach , 2002, CIKM '02.

[7]  N.A. Ismail,et al.  Mining arabic text using soft-matching association rules , 2007, 2007 International Conference on Computer Engineering & Systems.

[8]  Mounir Zrigui,et al.  Lexical Disambiguation of Arabic Language: An Experimental Study , 2012, Polytech. Open Libr. Int. Bull. Inf. Technol. Sci..

[9]  Mirella Lapata,et al.  Constructing Semantic Space Models from Parsed Corpora , 2003, ACL.

[10]  Narjès Bellamine Ben Saoud,et al.  Improving Arabic Texts Morphological Disambiguation Using a Possibilistic Classifier , 2014, NLDB.

[11]  Ibrahim Bounhas,et al.  Toward an Arabic Ontology for Arabic Word Sense Disambiguation Based on Normalized Dictionaries , 2014, OTM Workshops.

[12]  Khaled Shaalan,et al.  Arabic Natural Language Processing: Challenges and Solutions , 2009, TALIP.

[13]  Ahmed H. Aliwy,et al.  IMPROVEMENT WSD DICTIONARY USING ANNOTATED CORPUS AND TESTING IT WITH SIMPLIFIED LESK ALGORITHM , 2015 .

[14]  Qasem A. Al-Radaideh,et al.  Benchmarking and assessing the performance of Arabic stemmers , 2011, J. Inf. Sci..

[15]  Ahmed Abdelali,et al.  Improving query precision using semantic expansion , 2007, Inf. Process. Manag..

[16]  Mohamed El Bachir Menai,et al.  Word Sense Disambiguation Using an Evolutionary Approach , 2014, Informatica.

[17]  Narjès Bellamine Ben Saoud,et al.  A hybrid possibilistic approach for Arabic full morphological disambiguation , 2015, Data Knowl. Eng..

[18]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[19]  Mounir Zrigui,et al.  Ambiguous Arabic Words Disambiguation , 2010, 2010 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

[20]  Ammar Halabi,et al.  A Hybrid Approach for Indexing and Retrieval of Archaeological Textual Information , 2010, KES.

[21]  Ibrahim Bounhas,et al.  A hybrid model for Arabic document indexing , 2016, 2016 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD).

[22]  Philip Resnik,et al.  Word Sense Disambiguation within a Multilingual Framework , 2003 .

[23]  Mona Diab An Unsupervised Approach for Bootstrapping Arabic Sense Tagging , 2004 .

[24]  Leah S. Larkey,et al.  Arabic Information Retrieval at UMass in TREC-10 , 2001, TREC.

[25]  Ignacio Iacobacci,et al.  SensEmbed: Learning Sense Embeddings for Word and Relational Similarity , 2015, ACL.

[26]  Ibrahim Bounhas,et al.  Arabic Cross-Language Information Retrieval , 2016, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[27]  Ahmed S.A AL-Jumaily,et al.  Automatic Queuing Model for Banking Applications , 2011 .

[28]  Samir Elmougy,et al.  Naïve Bayes Classifier for Arabic Word Sense Disambiguation , 2008 .

[29]  Ismail Hmeidi,et al.  Extracting the roots of Arabic words without removing affixes , 2014, J. Inf. Sci..

[30]  Keith Stevens,et al.  The S-Space Package: An Open Source Package for Word Space Models , 2010, ACL.

[31]  Mounir Zrigui,et al.  Combination of information retrieval methods with LESK algorithm for Arabic word sense disambiguation , 2011, Artificial Intelligence Review.

[32]  Fredric C. Gey,et al.  Building an Arabic Stemmer for Information Retrieval , 2002, TREC.

[33]  Siham Boulaknadel Utilisation des syntagmes nominaux dans un système de recherche d'information en langue arabe , 2006, CORIA.

[34]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[35]  Narjès Bellamine Ben Saoud,et al.  Experimenting a discriminative possibilistic classifier with reweighting model for Arabic morphological disambiguation , 2015, Comput. Speech Lang..

[36]  M. Hadni,et al.  A new and efficient stemming technique for Arabic Text Categorization , 2012, 2012 International Conference on Multimedia Computing and Systems.

[37]  Sameh H. Ghwanmeh,et al.  Enhanced Algorithm for Extracting the Root of Arabic Words , 2009, 2009 Sixth International Conference on Computer Graphics, Imaging and Visualization.

[38]  Ibrahim Abu El-Khair Arabic information retrieval , 2007 .

[39]  Khaled M. Fouad,et al.  Semantic Retrieval Approach for Web Documents , 2011 .

[40]  Khaled Shaalan,et al.  Semantic Search for Arabic , 2015, The Florida AI Research Society.

[41]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing , 2000 .

[42]  Susan T. Dumais,et al.  The latent semantic analysis theory of knowledge , 1997 .