Towards a Passages Extraction Method for Arabic Question Answering Systems

Question Answering Systems (QASs) aim to provide a precise answer to questions written in natural language. Passages extraction is a challenging task that affects directly the performance of any QAS. In this paper, we propose a passages extraction method for Arabic Question Answering Systems. It consists of two steps: (1) formulating the query from the Arabic questions user and (2) extracting candidate passages that contain most probably, the correct answers. First, we describe the querys formulation by using stemmed words and performing a Pos-tagging process. Then, we identify relevant passages from Arabic Wikipedia based on two levels of Information Retrieval (IR). In the first level, we extract relevant documents from Arabic Wikipedia based on both documents titles and Named Entities (NEs) contained in the formulated query. The second IR level extracts candidate passages from the pages extracted in the first level based on the similarity with the query. This allows to reduce the number of extracted passages and keep the N top-ranked ones. The obtained primary results are promising as they show a high level of similarity between a given question and the candidate passages.