AraConc, an Arabic Concordance Software Based on the DIINAR.1 Language Resource

This paper presents the reader with the AraConc, software, which has been devised to extract concordances and frequency lists in Arabic by R. Abbes within the SILAT research team. (The SILAT group, ‘Systemes d’information, Ingenierie et Linguistique Arabe, Terminologie’, is included in the ICAR Lab, CNRS/Universite Lumiere-Lyon 2 and ENS-LSH.). AraConc is one of the analyzers based on the DIINAR.1 knowledge database (DIctionnaire INformatise de l’ARabe, version 1). First, the authors introduce the functions included in the software, which are based on the specific structures of Arabic texts (many of which are shared by Semitic languages of the same group). Second, they deal with the difficulties encountered in the pretreatment of texts, i.e.: the fact that usual Arabic script is ‘unvowelled’, the agglutinative structure of Arabic wordforms and the various morphological variations encountered. Third, they show the inadequacy of surface search techniques, and the subsequent interest of a farfetching morpho-syntactic analysis centred on the specific structures of the language, when it comes to the building of a real and thorough concordance software. Examples of how AraConc operates will also be given.

[1]  Hans Paulussen,et al.  Natural language processing and Arabic: the Leuven tandem approach , 2004 .

[2]  André Salem,et al.  Benoît Habert, Adeline Nazarenko, André Salem, Les linguistiques de corpus. Armand Colin, Paris, 1997 , 1998 .

[3]  Joseph Dichy,et al.  Grammar-Lexis Relations in the Computational Morphology of Arabic , 2007 .

[4]  Joseph Dichy,et al.  Extraction automatique de fréquences lexicales en arabe et analyse d'un corpus journalistique avec le logiciel AraConc et la base de connaissances DIINAR.1 , 2008 .

[5]  Eric Atwell,et al.  JEP-TALN 2004 - session on Arabic Language Processing A Review of Arabic Corpus Analysis Tools Un Examen d'Outils pour l'Analyse de Corpus Arabes , 2004 .

[6]  Joseph Dichy,et al.  The Architecture of a Standard Arabic Lexical Database. Some Figures, Ratios and Categories from the DIINAR.1 Source Program , 2004 .

[7]  B. Habert,et al.  Les linguistiques de corpus , 1997 .

[8]  Joseph Dichy,et al.  Pour une lexicomatique de l'arabe : l'unité lexicale simple et l'inventaire fini des spécificateurs du domaine du mot , 1997 .

[9]  Joseph Dichy,et al.  The DIINAR.1-« مـعـالـي » Arabic Lexical Resource, an outline of contents and methodology , 2005 .

[10]  G. Leech,et al.  Word Frequencies in Written and Spoken English: based on the British National Corpus , 2001 .

[11]  Riadh Zaafrani Développement d'un environnement interactif d'apprentissage avec ordinateur de l'arabe langue étrangère , 2002 .