On Modeling Sense Relatedness in Multi-prototype Word Embedding

To enhance the expression ability of distributional word representation learning model, many researchers tend to induce word senses through clustering, and learn multiple embedding vectors for each word, namely multi-prototype word embedding model. However, most related work ignores the relatedness among word senses which actually plays an important role. In this paper, we propose a novel approach to capture word sense relatedness in multi-prototype word embedding model. Particularly, we differentiate the original sense and extended senses of a word by introducing their global occurrence information and model their relatedness through the local textual context information. Based on the idea of fuzzy clustering, we introduce a random process to integrate these two types of senses and design two non-parametric methods for word sense induction. To make our model more scalable and efficient, we use an online joint learning framework extended from the Skip-gram model. The experimental results demonstrate that our model outperforms both conventional single-prototype embedding models and other multi-prototype embedding models, and achieves more stable performance when trained on smaller data.

[1]  Andrew McCallum,et al.  Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space , 2014, EMNLP.

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[4]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[5]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[6]  Xu Chen,et al.  Bridge Text and Knowledge by Learning Multi-Prototype Entity Mention Embedding , 2017, ACL.

[7]  David Jurgens,et al.  SemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses , 2013, SemEval@NAACL-HLT.

[8]  Yoshua Bengio,et al.  Neural Probabilistic Language Models , 2006 .

[9]  A survey of random processes with reinforcement , 2007, math/0610076.

[10]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[11]  Zhiyuan Liu,et al.  Topical Word Embeddings , 2015, AAAI.

[12]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[13]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[14]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[15]  Raymond J. Mooney,et al.  Multi-Prototype Vector-Space Models of Word Meaning , 2010, NAACL.

[16]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[17]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[18]  Zhiyuan Liu,et al.  A Unified Model for Word Sense Representation and Disambiguation , 2014, EMNLP.

[19]  Daniel Jurafsky,et al.  Do Multi-Sense Embeddings Improve Natural Language Understanding? , 2015, EMNLP.

[20]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[21]  W. Engelhardt,et al.  Theory of Earth Science , 1988 .

[22]  Enhong Chen,et al.  A Probabilistic Model for Learning Multi-Prototype Word Embeddings , 2014, COLING.

[23]  Katrin Erk,et al.  Graded Word Sense Assignment , 2009, EMNLP.