Wiktionary and NLP: improving synonymy networks

Wiktionary, a satellite of the Wikipedia initiative, can be seen as a potential resource for Natural Language Processing. It requires however to be processed before being used efficiently as an NLP resource. After describing the relevant aspects of Wiktionary for our purposes, we focus on its structural properties. Then, we describe how we extracted synonymy networks from this resource. We provide an in-depth study of these synonymy networks and compare them to those extracted from traditional resources. Finally, we describe two methods for semi-automatically improving this network by adding missing relations: (i) using a kind of semantic proximity measure; (ii) using translation relations of Wiktionary itself.

[1]  Bruno Gaume,et al.  Toward a cognitive organization for electronic dictionaries, the case for semantic proxemy , 2008, COLING 2008.

[2]  Joshua B. Tenenbaum,et al.  The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth , 2001, Cogn. Sci..

[3]  Benoît Sagot,et al.  Building a free French wordnet from multilingual resources , 2008 .

[4]  Emiel Krahmer,et al.  Language, Communication and Cognition , 2008 .

[5]  ERSS,et al.  Semantic associations and confluences in paradigmatic networks , 2006 .

[6]  Christiane Fellbaum La représentation des verbes dans le réseau sémantique WordNet , 1999 .

[7]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[8]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[9]  Iryna Gurevych,et al.  Using Wiktionary for Computing Semantic Relatedness , 2008, AAAI.

[10]  Bernard Victorri,et al.  La polysémie : construction dynamique du sens , 1996 .

[11]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[12]  Adam Kilgarriff,et al.  Introduction to the Special Issue on the Web as Corpus , 2003, CL.

[13]  J. Giles Internet encyclopaedias go head to head , 2005, Nature.

[14]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[15]  Pavel Smrž Quality Control and Checking for Wordnet Development: A CaseStudy of BalkaNet , 2004 .

[16]  Frank Keller,et al.  Using the Web to Overcome Data Sparseness , 2002, EMNLP.

[17]  Eve Sweetser From Etymology To Pragmatics , 1990 .

[18]  Laura Monceaux,et al.  French EuroWordNet Lexical Database Improvements , 2007, CICLing.

[19]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[20]  Adam Kilgarriff,et al.  "I Don’t Believe in Word Senses" , 1997, Comput. Humanit..

[21]  Adam Kilgarriff,et al.  Cleaneval: a Competition for Cleaning Web Pages , 2008, LREC.

[22]  Iryna Gurevych,et al.  Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary , 2008, LREC.

[23]  Chauncy D. Harris The New Encyclopaedia Britannica , 1975 .

[24]  Lada A. Adamic,et al.  Power-Law Distribution of the World Wide Web , 2000, Science.