Enhancing a large scale dictionary with a two-level system

We present in this paper a morphological analyzer and generator for French that contains a dictionary of 700,000 inflected words called DELAF 1, and a full twolevel system aimed at the analysis of new derivatives. Hence, this tool recognizes and generates both correct inflected forms of French simple words (DELAF lookup procedure) and new derivatives and their inflected forms (two-level analysis). Moreover, a clear distinction is made between dictionary look-up processes and new words analyses in order to clearly identify the analyses that involve heuristic rules. We tested this tool upon a French corpus of 1 , 3 0 0 , 0 0 0 words with significant results (Clemenceau D. 1992). With regards to efficiency, since this tool is compiled in to a unique transducer, it provides a very fast look-up procedure (1,100 words per second) at a low memory cost (around 1.3 Mb in RAM). Enhancing a large scale dictionary with a two-level system David Clemenceau & Emmanuel Roche LADL: Latxaatoire d'Automatique Documentaire et Linguistique Universit6 Paris 7; 2, place Jussieu, 75251 Paris cedex 05, France e-mail: roche@ max.ladl.jussieu.fr