Combining Relational and Distributional Knowledge for Word Sense Disambiguation

We present a new approach to word sense disambiguation derived from recent ideas in distributional semantics. The input to the algorithm is a large unlabeled corpus and a graph describing how senses are related; no sense-annotated corpus is needed. The fundamental idea is to embed meaning representations of senses in the same continuous-valued vector space as the representations of words. In this way, the knowledge encoded in the lexical resource is combined with the information derived by the distributional methods. Once this step has been carried out, the sense representations can be plugged back into e.g. the skip-gram model, which allows us to compute scores for the different possible senses of a word in a given context. We evaluated the new word sense disambiguation system on two Swedish test sets annotated with senses defined by the SALDO lexical resource. In both evaluations, our system soundly outperformed random and first-sense baselines. Its accuracy was slightly above that of a wellknown graph-based system, while being computationally much more efficient.

[1]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Richard Johansson,et al.  Embedding a Semantic Network in a Word Space , 2015, NAACL.

[4]  Richard Johansson,et al.  Neural context embeddings for automatic discovery of word senses , 2015, VS@HLT-NAACL.

[5]  Collin F. Baker,et al.  A Frames Approach to Semantic Analysis , 2009 .

[6]  Björn Gambäck,et al.  Towards Dynamic Word Sense Discrimination with Random Indexing , 2013, CVSM@ACL.

[7]  Richard Johansson,et al.  Defining the Eukalyptus forest – the Koala treebank of Swedish , 2015, NODALIDA.

[8]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[9]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[10]  Maria Toporowska Gronostaj,et al.  The Rocky Road towards a Swedish FrameNet - Creating SweFN , 2012, LREC.

[11]  Mirella Lapata,et al.  Dependency-Based Construction of Semantic Space Models , 2007, CL.

[12]  Katrin Erk,et al.  Exemplar-Based Models for Word Meaning in Context , 2010, ACL.

[13]  Anders Holst,et al.  Random indexing of text samples for latent semantic analysis , 2000 .

[14]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[15]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[16]  Markus Forsberg,et al.  SALDO: a touch of yin to WordNet’s yang , 2013, Lang. Resour. Evaluation.

[17]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[18]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[19]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[20]  Eneko Agirre,et al.  Personalizing PageRank for Word Sense Disambiguation , 2009, EACL.

[21]  Julie Weeds,et al.  Unsupervised Acquisition of Predominant Word Senses , 2007, CL.

[22]  Magnus Sahlgren,et al.  Navigating the Semantic Horizon using Relative Neighborhood Graphs , 2015, EMNLP.

[23]  Magnus Sahlgren,et al.  The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces , 2006 .

[24]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[25]  Koray Kavukcuoglu,et al.  Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[26]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[27]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[28]  Ted Pedersen,et al.  Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces , 2004, CoNLL.

[29]  Andrew McCallum,et al.  Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space , 2014, EMNLP.

[30]  Omer Levy,et al.  Linguistic Regularities in Sparse and Explicit Word Representations , 2014, CoNLL.

[31]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.