An Ellipsis Resolution System for the Arabic Language

The ellipsis phenomenon constitutes one of the important topics of study in natural language processing because it appears frequently in dialogues as well as in written texts. This is the context of the present article which proposes an ellipsis processing approach for the Arabic language. Our first contribution consists of introducing a formal characterisation of the ellipsis phenomenon, which constitutes the basis of the method proposed for detection of elliptical sentence parts. Then we present a clause grammar that makes it possible to distinguish between well-formed clauses and those with missing constituents. Concerning the resolution, the proposed method relies on an elliptical sentence classification underlying these three different resolution processes: using propagation, cascaded, and alternation. In this paper, we also try to resolve some ambiguities concerning ellipsis resolution and to study the phenomenon of anaphora, which can interact with ellipsis. To prove the feasibility of the proposed approaches, we have developed a prototype called ERASE (Ellipsis Resolution of Arabic Sentences) and we have tested it on a corpus of elliptical Arabic sentences. The results obtained are satisfactory.