Extending wordnets by learning from multiple resources

In this paper we present an automatic, language-independent approach to extend an existing wordnet by recycling existing freely available bilingual resources, such as machine-readable dictionaries and on-line encyclopaedias. The approach is applied to Slovene and French. The words extracted from the bilingual resources are assigned one or several synset ids based on a classifier that relies on several features, including distributional similarity. Automatic and manual evaluation shows that the resulting extensions of sloWNet and WOLF are lexico-semantic repositories of high coverage as well as high quality.

[1]  Benoît Sagot,et al.  Combining Multiple Resources to Build Reliable Wordnets , 2008, TSD.

[2]  Gaël de Chalendar,et al.  JAWS : Just Another WordNet Subset , 2010 .

[3]  Tomaû Erjavec,et al.  Building the Slovene Wordnet: First Steps, First Problems , 2006 .

[4]  Duško Vitas,et al.  Using Textual and Lexical Resources in Developing Serbian Wordnet , 2004 .

[5]  Kevin Knight,et al.  Building a Large-Scale Knowledge Base for Machine Translation , 1994, AAAI.

[6]  Dominic Widdows,et al.  Semantic Vectors: a Scalable Open Source Package and Online Technology Management Application , 2008, LREC.

[7]  Simone Paolo Ponzetto,et al.  Large-Scale Taxonomy Mapping for Restructuring and Integrating Wikipedia , 2009, IJCAI.

[8]  Silvia Bernardini,et al.  A pilot study of English / French collocation extraction and translation , 2008 .

[9]  Iryna Gurevych,et al.  Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding , 2009, ACL.

[10]  Maria Ruiz-Casado,et al.  Automatic Assignment of Wikipedia Encyclopedic Entries to WordNet Synsets , 2005, AWIC.

[11]  Richard Xiao,et al.  Using Corpora in Contrastive and Translation Studies , 2010 .

[12]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[13]  Pascale Fung,et al.  A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora , 1995, ACL.

[14]  Helge Dyvik,et al.  Translations as semantic mirrors: from parallel corpus to wordnet , 2004 .

[15]  Hal Daumé Notes on CG and LM-BFGS Optimization of Logistic Regression , 2008 .

[16]  Gerhard Weikum,et al.  Towards a universal wordnet by learning from combined evidence , 2009, CIKM.

[17]  Janko Kotnik Slovene-English dictionary , 1954 .

[18]  Pascale Pung,et al.  A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora , 1995, ACL 1995.

[19]  Nancy Ide,et al.  Sense Discrimination with Parallel Corpora , 2002, SENSEVAL.

[20]  Simone Paolo Ponzetto,et al.  BabelNet: Building a Very Large Multilingual Semantic Network , 2010, ACL.

[21]  Mona T. Diab Feasibility of Bootstrapping an Arabic WordNet Leveraging Parallel Corpora and an English WordNet , 2022 .