Morpho-lexical ambiguities in the recognition of written Arabic word-forms, evidence from the DIINAR.1 lexical resource

This paper tackles the issue of the ambiguities that are generated by the morpho-lexical system of Arabic, and not only by the fact that the standard writing of the language is 'unvowelled'. Evidence is drawn from the Arabic monolingual DIINAR.1 lexical resource (section II). Authors focus on morpho-lexical ambiguities, which are considered successively in vowelled and unvowelled writing. For lack of space, and in order to ensure consistency in the presentation of data, all the examples belong to verb structures. Statistical results are given, as to the level of ambiguity of prefix/suffix combinations and that of pre-stem/post-stem combinations. Percentage results are also given concerning the number of stems that are found in the conjugation of verbs. The numbers of stems that are related either to a single verb, or to two verbs or more, are also analysed. Statistical results are given, in addition, for stem-verb-root relations. Evidence from vowelled and from unvowelled realisations of stems, verbs and word-forms is proposed. Root recognition is finally examined, and the results obtained by queries in the DIINAR.1 lexical resource are compared - on this point - to those of the morphological analysis of a corpus of 2 million word occurrences (al-©ayāt newspaper, year 1995).

[1]  Joseph Dichy,et al.  Approche expérimentale de la reconnaissance du mot écrit en arabe , 2003 .

[2]  Riadh Zaafrani Développement d'un environnement interactif d'apprentissage avec ordinateur de l'arabe langue étrangère , 2002 .

[3]  Joseph Dichy,et al.  Pour une lexicomatique de l'arabe : l'unité lexicale simple et l'inventaire fini des spécificateurs du domaine du mot , 1997 .

[4]  Ramzi Abbes La Conception et la réalisation d'un concordancier électronique pour l'arabe , 2004 .

[5]  Mohamed Hassoun Conception d'un dictionnaire pour le traitement automatique de l'arabe dans différents contextes d'application , 1987 .

[6]  Kenneth R. Beesley,et al.  Finite-State Morphological Analysis and Generation of Arabic at Xerox Research: Status and Plans in 2001 , 2001 .

[7]  Joseph Dichy,et al.  The Architecture of a Standard Arabic Lexical Database. Some Figures, Ratios and Categories from the DIINAR.1 Source Program , 2004 .

[8]  Joseph Dichy,et al.  L'ecriture dans la representation de la langue : la lettre et le mot en arabe , 1990 .

[9]  K. Forster,et al.  What can we learn from the morphology of Hebrew? A masked-priming investigation of morphological representation. , 1997, Journal of experimental psychology. Learning, memory, and cognition.

[10]  K I Forster,et al.  Decomposing morphologically complex words in a nonlinear morphology. , 2000, Journal of experimental psychology. Learning, memory, and cognition.

[11]  Joseph Dichy Morphosyntactic Specifiers to be Associated to Arabic Lexical Entries - Methodological and Theoretical Aspects , 2000 .