Sense Embedding Learning for Word Sense Induction

Conventional word sense induction (WSI) methods usually represent each instance with discrete linguistic features or cooccurrence features, and train a model for each polysemous word individually. In this work, we propose to learn sense embeddings for the WSI task. In the training stage, our method induces several sense centroids (embedding) for each polysemous word. In the testing stage, our method represents each instance as a contextual vector, and induces its sense by finding the nearest sense centroid in the embedding space. The advantages of our method are (1) distributed sense vectors are taken as the knowledge representations which are trained discriminatively, and usually have better performance than traditional count-based distributional models, and (2) a general model for the whole vocabulary is jointly trained to induce sense centroids under the mutlitask learning framework. Evaluated on SemEval-2010 WSI dataset, our method outperforms all participants and most of the recent state-of-the-art methods. We further verify the two advantages by comparing with carefully designed baselines.

[1]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[2]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[3]  Richard Johansson,et al.  Neural context embeddings for automatic discovery of word senses , 2015, VS@HLT-NAACL.

[4]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[5]  Daniel Jurafsky,et al.  Do Multi-Sense Embeddings Improve Natural Language Understanding? , 2015, EMNLP.

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Eduard H. Hovy,et al.  Unsupervised Word Sense Induction using Distributional Statistics , 2014, COLING.

[8]  Deyi Xiong,et al.  A Sense-Based Translation Model for Statistical Machine Translation , 2014, ACL.

[9]  Roberto Navigli,et al.  Inducing Word Senses to Improve Web Search Result Clustering , 2010, EMNLP.

[10]  Marianna Apidianaki,et al.  Latent Semantic Word Sense Induction and Disambiguation , 2011, ACL.

[11]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[12]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[13]  Stefan Bordag Word Sense Induction: Triplet-Based Clustering and Automatic Evaluation , 2006, EACL.

[14]  Mirella Lapata,et al.  Bayesian Word Sense Induction , 2009, EACL.

[15]  Jing Wang,et al.  A Sense-Topic Model for Word Sense Induction with Unsupervised Data Enrichment , 2015, TACL.

[16]  Chris Dyer,et al.  Ontologically Grounded Multi-sense Representation Learning for Semantic Vector Space Models , 2015, NAACL.

[17]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[18]  Suresh Manandhar,et al.  SemEval-2010 Task 14: Word Sense Induction &Disambiguation , 2010, SemEval@ACL.

[19]  Xuchen Yao,et al.  Nonparametric Bayesian Word Sense Induction , 2011, Graph-based Methods for Natural Language Processing.

[20]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[21]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[22]  Ioannis Korkontzelos,et al.  UoY: Graphs of Unambiguous Vertices for Word Sense Induction and Disambiguation , 2010, SemEval@ACL.

[23]  Zhiyuan Liu,et al.  A Unified Model for Word Sense Representation and Disambiguation , 2014, EMNLP.

[24]  Wei Ding,et al.  A Fully Unsupervised Word Sense Disambiguation Method Using Dependency Knowledge , 2009, HLT-NAACL.

[25]  Raymond J. Mooney,et al.  Multi-Prototype Vector-Space Models of Word Meaning , 2010, NAACL.

[26]  Ted Pedersen,et al.  Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces , 2004, CoNLL.

[27]  Andrew McCallum,et al.  Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space , 2014, EMNLP.

[28]  Eugene Charniak,et al.  Naive Bayes Word Sense Induction , 2013, EMNLP.

[29]  Timothy Baldwin,et al.  Word Sense Induction for Novel Sense Detection , 2012, EACL.

[30]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[31]  Enhong Chen,et al.  A Probabilistic Model for Learning Multi-Prototype Word Embeddings , 2014, COLING.

[32]  Hinrich Schütze,et al.  AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes , 2015, ACL.

[33]  Ted Pedersen Duluth-WSI: SenseClusters Applied to the Sense Induction Task of SemEval-2 , 2010, SemEval@ACL.