Wiktionnaire's Wikicode GLAWIfied: a Workable French Machine-Readable Dictionary

GLAWI is a free, large-scale and versatile Machine-Readable Dictionary (MRD) that has been extracted from the French language edition of Wiktionary, called Wiktionnaire. In (Sajous and Hathout, 2015), we introduced GLAWI, gave the rationale behind the creation of this lexicographic resource and described the extraction process, focusing on the conversion and standardization of the heterogeneous data provided by this collaborative dictionary. In the current article, we describe the content of GLAWI and illustrate how it is structured. We also suggest various applications, ranging from linguistic studies, NLP applications to psycholinguistic experimentation. They all can take advantage of the diversity of the lexical knowledge available in GLAWI. Besides this diversity and extensive lexical coverage, GLAWI is also remarkable because it is the only free lexical resource of contemporary French that contains definitions. This unique material opens way to the renewal of MRD-based methods, notably the automated extraction and acquisition of semantic relations.

[1]  Bruno Gaume,et al.  Skillex, an action labelling efficiency score: the case for French and Mandarin , 2014, CogSci.

[2]  Li Wang,et al.  How Noisy Social Media Text, How Diffrnt Social Media Sources? , 2013, IJCNLP.

[3]  Nabil Hathout,et al.  Acquisition and enrichment of morphological and morphosemantic knowledge from the French Wiktionary , 2014, LG-LP@COLING.

[4]  Chris Brew,et al.  Using the Wiktionary Graph Structure for Synonym Detection , 2009, PWNLP@IJCNLP.

[5]  Basilio Calderone,et al.  Phonotactic probabilities in Italian simplex and complex words: a fragment priming study , 2015, NetWordS.

[6]  Oren Etzioni,et al.  Compiling a Massive, Multilingual Dictionary via Probabilistic Inference , 2009, ACL.

[7]  Jörg Tiedemann,et al.  Efficient Discrimination Between Closely Related Languages , 2012, COLING.

[8]  Bruno Gaume,et al.  SLAM. Automatic lexical solutions for metaphors , 2009, Trait. Autom. des Langues.

[9]  ˇ IvanaLu Efficient Discrimination Between Closely Related Languages , 2012 .

[10]  Jan Snajder,et al.  Derivational Smoothing for Syntactic Distributional Semantics , 2013, ACL.

[11]  Nabil Hathout,et al.  GLÀFF, a Large Versatile French Lexicon , 2014, LREC.

[12]  Marta R. Costa-jussà,et al.  Holaaa!! writin like u talk is kewl but kinda hard 4 NLP , 2012, LREC.

[13]  Matej Rojc,et al.  Time and space-efficient architecture for a corpus-based text-to-speech synthesis system , 2007, Speech Commun..

[14]  Emmanuel Navarro,et al.  Semi-automatic Endogenous Enrichment of Collaboratively Constructed Lexical Resources: Piggybacking onto Wiktionary , 2010, IceTAL.

[15]  Nabil Hathout,et al.  GLAWI, a free XML-encoded Machine-Readable Dictionary built from the French Wiktionary , 2015 .

[16]  Weiyi Meng,et al.  Using the Structure of HTML Documents to Improve Retrieval , 1997, USENIX Symposium on Internet Technologies and Systems.

[17]  Roberto Navigli,et al.  The English lexical substitution task , 2009, Lang. Resour. Evaluation.

[18]  Assaf Urieli,et al.  Robust French syntax analysis: reconciling statistical methods and linguistic knowledge in the Talismane toolkit. (Analyse syntaxique robuste du français : concilier méthodes statistiques et connaissances linguistiques dans l'outil Talismane) , 2013 .

[19]  Ludovic Tanguy,et al.  A Multitude of Linguistically-rich Features for Authorship Attribution - Notebook for PAN at CLEF 2011 , 2011, CLEF.

[20]  Emmanuel Navarro,et al.  Semi-automatic enrichment of crowdsourced synonymy networks: the WISIGOTH system applied to Wiktionary , 2011, Language Resources and Evaluation.

[21]  Nabil Hathout,et al.  Démonette, a French derivational morpho-semantic network , 2014, LILT.

[22]  Nabil Hathout Morphonette: a paradigm-based morphological network , 2011 .

[23]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[24]  David Yarowsky,et al.  Toward Statistical Machine Translation without Parallel Corpora , 2012, EACL 2012.