Automated WordNet Construction Using Word Embeddings

We present a fully unsupervised method for automated construction of WordNets based upon recent advances in distributional representations of sentences and word-senses combined with readily available machine translation tools. The approach requires very few linguistic resources and is thus extensible to multiple target languages. To evaluate our method we construct two 600-word test sets for word-to-synset matching in French and Russian using native speakers and evaluate the performance of our method along with several other recent approaches. Our method exceeds the best language-specific and multi-lingual automated WordNets in F-score for both languages. The databases we construct for French and Russian, both languages without large publicly available manually constructed WordNets, will be publicly released along with the test sets.

[1]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[2]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[3]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[4]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[5]  Sergey Yablonsky,et al.  Semi-Automated English-Russian WordNet Construction: Initial Resources, Software and Methods of Translation , 2006 .

[6]  Daniel Jurafsky,et al.  Learning to Merge Word Senses , 2007, EMNLP.

[7]  Benoît Sagot,et al.  Building a free French wordnet from multilingual resources , 2008 .

[8]  Gerhard Weikum,et al.  Towards a universal wordnet by learning from combined evidence , 2009, CIKM.

[9]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[10]  Heshaam Faili,et al.  Automatic Persian WordNet Construction , 2010, COLING.

[11]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[12]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[13]  Francis Bond,et al.  Linking and Extending an Open Multilingual Wordnet , 2013, ACL.

[14]  Gaël de Chalendar,et al.  WoNeF, an improved, expanded and evaluated automatic French translation of WordNet , 2014, GWC.

[15]  Manaal Faruqui,et al.  Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[16]  Gilles Sérasset,et al.  Induction de sens pour enrichir des ressources lexicales , 2014 .

[17]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[18]  Sanjeev Arora,et al.  RAND-WALK: A Latent Variable Model Approach to Word Embeddings , 2015 .

[19]  Hinrich Schütze,et al.  AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes , 2015, ACL.

[20]  Ignacio Iacobacci,et al.  SensEmbed: Learning Sense Embeddings for Word and Relational Similarity , 2015, ACL.

[21]  Kevin Gimpel,et al.  Towards Universal Paraphrastic Sentence Embeddings , 2015, ICLR.

[22]  Jugal Kalita,et al.  Enhancing Automatic Wordnet Construction Using Word Embeddings , 2016 .

[23]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[24]  Sanjeev Arora,et al.  Linear Algebraic Structure of Word Senses, with Applications to Polysemy , 2016, TACL.