MOSSA: a morpho-semantic knowledge extraction system for Arabic information retrieval

In this paper, we propose to exploit different morpho-semantic resources to enhance Arabic information retrieval (IR). We use standardised LMF Arabic dictionaries and Arabic corpora. Our goal by this communication is to take advantage of the different existing resources to extract useful knowledge for Arabic IR. We equally study the impact of the Arabic morphology on IR effectiveness. Several query expansion strategies are carried based on morphological, semantic and morpho-semantic relations. In addition, combining such knowledge is also studied and evaluated. We experiment the effect of short diacritics and part of speech (POS) disambiguation and tagging in the indexing step. A graph-based representation is used to formalise knowledge resources graph-based representation. This latter represents a powerful formalism to express semantics of texts and to support NLP tools and applications as IR. Several experimental comparisons are handled between the different used knowledge resources and the different carried IR approaches.