论文信息 - Using Stemming in Morphological Analysis to Improve Arabic Information Retrieval

Using Stemming in Morphological Analysis to Improve Arabic Information Retrieval

Information retrieval (IR) consists in finding all relevant documents for a user query in a collection of documents. These documents are ordered by the probability of being relevant to the user’s query. The highest ranked document is considered to be the most likely relevant document. Natural Language Processing (NLP) for IR aims to transform the potentially ambiguous words of queries and documents into unambiguous internal representations on which matching and retrieval can take place. This transformation is generally achieved by several levels of linguistic analysis, morphological, syntactic and so forth. In this paper, we present the Arabic linguistic analyzer used in the LIC2M cross-lingual search engine. We focus on the morphological analyzer and particularly the clitic stemmer which segments the input words into proclitics, simple forms and enclitics. We demonstrate that stemming improves search engine recall and precision.

Nasredine Semmar | Meriama Laïb | Christian Fluhr

[1] Saleem Abuleil,et al. Named Entity Recognition and Classification for Text in Arabic , 2004, IASSE.

[2] Lotfi Zouari. Construction automatique d'un dictionnaire orienté vers l'analyse morpho-syntaxique de l'arabe, écrit voyellé ou non voyellé , 1989 .

[3] Nasredine Semmar,et al. Modifying a Natural Language Processing System for European Languages to Treat Arabic in Information Processing and Information Retrieval Applications , 2005, SEMITIC@ACL.

[4] Kareem Darwish,et al. Building a Shallow Arabic Morphological Analyser in One Day , 2002, SEMITIC@ACL.

[5] Lisa Ballesteros,et al. Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis , 2002, SIGIR '02.

[6] Ahmed Abdelali,et al. Arabic Information Retrieval Perspectives , 2004 .