Extending and Improving Wordnet via Unsupervised Word Embeddings

This work presents an unsupervised approach for improving WordNet that builds upon recent advances in document and sense representation via distributional semantics. We apply our methods to construct Wordnets in French and Russian, languages which both lack good manual constructions.1 These are evaluated on two new 600-word test sets for word-to-synset matching and found to improve greatly upon synset recall, outperforming the best automated Wordnets in F-score. Our methods require very few linguistic resources, thus being applicable for Wordnet construction in low-resources languages, and may further be applied to sense clustering and other Wordnet improvements.

[1]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[2]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[3]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[4]  Eneko Agirre,et al.  Clustering WordNet word senses , 2003, RANLP.

[5]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[6]  Gilles Sérasset,et al.  Induction de sens pour enrichir des ressources lexicales , 2014 .

[7]  Heshaam Faili,et al.  Automatic Persian WordNet Construction , 2010, COLING.

[8]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[9]  Roberto Navigli,et al.  Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities , 2016, Artif. Intell..

[10]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[11]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[12]  Gaël de Chalendar,et al.  WoNeF, an improved, expanded and evaluated automatic French translation of WordNet , 2014, GWC.

[13]  Gerhard Weikum,et al.  Towards a universal wordnet by learning from combined evidence , 2009, CIKM.

[14]  Sanjeev Arora,et al.  Linear Algebraic Structure of Word Senses, with Applications to Polysemy , 2016, TACL.

[15]  Jugal Kalita,et al.  Enhancing Automatic Wordnet Construction Using Word Embeddings , 2016 .

[16]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[17]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[18]  Sanjeev Arora,et al.  RAND-WALK: A Latent Variable Model Approach to Word Embeddings , 2015 .

[19]  Daniel Jurafsky,et al.  Learning to Merge Word Senses , 2007, EMNLP.

[20]  Benoît Sagot,et al.  Building a free French wordnet from multilingual resources , 2008 .

[21]  Ignacio Iacobacci,et al.  SensEmbed: Learning Sense Embeddings for Word and Relational Similarity , 2015, ACL.

[22]  Manaal Faruqui,et al.  Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[23]  Francis Bond,et al.  Linking and Extending an Open Multilingual Wordnet , 2013, ACL.

[24]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[25]  Hinrich Schütze,et al.  AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes , 2015, ACL.

[26]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.