Extending and Improving Wordnet via Unsupervised Word Embeddings

This work presents an unsupervised approach for improving WordNet that builds upon recent advances in document and sense representation via distributional semantics. We apply our methods to construct Wordnets in French and Russian, languages which both lack good manual constructions.1 These are evaluated on two new 600-word test sets for word-to-synset matching and found to improve greatly upon synset recall, outperforming the best automated Wordnets in F-score. Our methods require very few linguistic resources, thus being applicable for Wordnet construction in low-resources languages, and may further be applied to sense clustering and other Wordnet improvements.

[1]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[2]  Roberto Navigli,et al.  Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities , 2016, Artif. Intell..

[3]  Jugal Kalita,et al.  Enhancing Automatic Wordnet Construction Using Word Embeddings , 2016 .

[4]  Sanjeev Arora,et al.  Linear Algebraic Structure of Word Senses, with Applications to Polysemy , 2016, TACL.

[5]  Ignacio Iacobacci,et al.  SensEmbed: Learning Sense Embeddings for Word and Relational Similarity , 2015, ACL.

[6]  Hinrich Schütze,et al.  AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes , 2015, ACL.

[7]  Sanjeev Arora,et al.  RAND-WALK: A Latent Variable Model Approach to Word Embeddings , 2015 .

[8]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[9]  Gilles Sérasset,et al.  Induction de sens pour enrichir des ressources lexicales , 2014 .

[10]  Manaal Faruqui,et al.  Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[11]  Francis Bond,et al.  Linking and Extending an Open Multilingual Wordnet , 2013, ACL.

[12]  J. Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[13]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[14]  Heshaam Faili,et al.  Automatic Persian WordNet Construction , 2010, COLING.

[15]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[16]  Gerhard Weikum,et al.  Towards a universal wordnet by learning from combined evidence , 2009, CIKM.

[17]  Benoît Sagot,et al.  Building a free French wordnet from multilingual resources , 2008 .

[18]  Daniel Jurafsky,et al.  Learning to Merge Word Senses , 2007, EMNLP.

[19]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[20]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[21]  Gaël de Chalendar,et al.  WoNeF, an improved, expanded and evaluated automatic French translation of WordNet , 2014, GWC.

[22]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[23]  Eneko Agirre,et al.  Clustering WordNet word senses , 2003, RANLP.

[24]  Dekang Lin,et al.  WordNet: An Electronic Lexical Database , 1998 .

[25]  P. Vossen EuroWordNet: a multilingual database with lexical semantic networks for European Languages , 1998 .