GLÀFF, a Large Versatile French Lexicon

This paper introduces GLAFF, a large-scale versatile French lexicon extracted from Wiktionary, the collaborative online dictionary. GLAFF contains, for each entry, inflectional features and phonemic transcriptions. It distinguishes itself from the other available French lexicons by its size, its potential for constant updating and its copylefted license. We explain how we have built GLAFF and compare it to other known resources in terms of coverage and quality of the phonemic transcriptions. We show that its size and quality are strong assets that could allow GLAFF to become a reference lexicon for French NLP and linguistics. Moreover, other derived lexicons can easily be based on GLAFF to satisfy specific needs of various fields such as psycholinguistics.

[1]  Hugo Gonçalo Oliveira,et al.  Extracting Lexical-Semantic Knowledge from the Portuguese Wiktionary , 2011 .

[2]  Benoît Sagot,et al.  Morphology Based Automatic Acquisition of Large-coverage Lexica , 2004, LREC.

[3]  Iryna Gurevych,et al.  Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary , 2008, LREC.

[4]  Pavel Rychlý,et al.  Manatee/Bonito - A Modular Corpus Manager , 2007, RASLAN.

[5]  Christophe d'Alessandro,et al.  A French Phonetic Lexicon with Variants for Speech and Language Processing , 2000, LREC.

[6]  S. Detey,et al.  Les variétés du français parlé dans l'espace francophone. Ressources pour l'enseignement. , 2010 .

[7]  Gil Francopoulo,et al.  Standards going concrete : from LMF to Morphalou , 2004, COLING 2004.

[8]  Huang Chu-Ren,et al.  Wiktionary and NLP: improving synonymy networks , 2009, ACL 2009.

[9]  Iryna Gurevych,et al.  OntoWiktionary – Constructing an Ontology from the Collaborative Online Dictionary Wiktionary , 2012 .

[10]  Nabil Hathout,et al.  From GLÀFF to PsychoGLÀFF: a large psycholinguistics-oriented French lexical resource , 2014 .

[11]  Nabil Hathout,et al.  GLÀFF, un Gros Lexique À tout Faire du Français , 2013 .

[12]  Gilles Sérasset,et al.  Dbnary: Wiktionary as a LMF based Multilingual RDF network , 2012, LREC.

[13]  Assaf Urieli,et al.  Robust French syntax analysis: reconciling statistical methods and linguistic knowledge in the Talismane toolkit. (Analyse syntaxique robuste du français : concilier méthodes statistiques et connaissances linguistiques dans l'outil Talismane) , 2013 .

[14]  Guy Perennou,et al.  BDLEX lexical data and knowledge base of spoken and written French , 1987, ECST.

[15]  Iryna Gurevych,et al.  UBY - A Large-Scale Unified Lexical-Semantic Resource Based on LMF , 2012, EACL.

[16]  Ngoc Thang Vu,et al.  GlobalPhone: A multilingual text & speech database in 20 languages , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Emmanuel Navarro,et al.  Semi-automatic Endogenous Enrichment of Collaboratively Constructed Lexical Resources: Piggybacking onto Wiktionary , 2010, IceTAL.

[18]  Hugo Gonçalo Oliveira,et al.  Onto.PT: Automatic Construction of a Lexical Ontology for Portuguese , 2010, STAIRS.

[19]  Emmanuel Navarro,et al.  Semi-automatic enrichment of crowdsourced synonymy networks: the WISIGOTH system applied to Wiktionary , 2011, Language Resources and Evaluation.

[20]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[21]  A. Martinet,et al.  Dictionnaire de la prononciation française dans son usage réel , 1974 .