Structured Generative Models of Continuous Features for Word Sense Induction

We propose a structured generative latent variable model that integrates information from multiple contextual representations for Word Sense Induction. Our approach jointly models global lexical, local lexical and dependency syntactic context. Each context type is associated with a latent variable and the three types of variables share a hierarchical structure. We use skip-gram based word and dependency context embeddings to construct all three types of representations, reducing the total number of parameters to be estimated and enabling better generalization. We describe an EM algorithm to efficiently estimate model parameters and use the Integrated Complete Likelihood criterion to automatically estimate the number of senses. Our model achieves state-of-the-art results on the SemEval-2010 and SemEval-2013 Word Sense Induction datasets.

[1]  Timothy Baldwin,et al.  unimelb: Topic Modelling-based Word Sense Induction for Web Snippet Clustering , 2013, SemEval@NAACL-HLT.

[2]  Joakim Nivre,et al.  Universal Stanford dependencies: A cross-linguistic typology , 2014, LREC.

[3]  Suresh Manandhar,et al.  Evaluating Word Sense Induction and Disambiguation Methods , 2013, Lang. Resour. Evaluation.

[4]  Christian Hennig,et al.  Methods for merging Gaussian mixture components , 2010, Adv. Data Anal. Classif..

[5]  Mirella Lapata,et al.  Bayesian Word Sense Induction , 2009, EACL.

[6]  Jing Wang,et al.  A Sense-Topic Model for Word Sense Induction with Unsupervised Data Enrichment , 2015, TACL.

[7]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[8]  P. Deb Finite Mixture Models , 2008 .

[9]  Ignacio Iacobacci,et al.  SensEmbed: Learning Sense Embeddings for Word and Relational Similarity , 2015, ACL.

[10]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[11]  Timothy Baldwin,et al.  Word Sense Induction for Novel Sense Detection , 2012, EACL.

[12]  Enis Sert,et al.  AI-KU: Using Substitute Vectors and Co-Occurrence Modeling For Word Sense Induction and Disambiguation , 2013, SemEval@NAACL-HLT.

[13]  Silvia Bernardini,et al.  Introducing and evaluating ukWaC , a very large web-derived corpus of English , 2008 .

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[16]  Baobao Chang,et al.  Inducing Word Sense with Automatically Learned Hidden Concepts , 2014, COLING.

[17]  Suresh Manandhar,et al.  SemEval-2010 Task 14: Word Sense Induction &Disambiguation , 2010, SemEval@ACL.

[18]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[19]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Roberto Navigli,et al.  Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction , 2013, CL.

[21]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[22]  P KlapaftisIoannis,et al.  Evaluating Word Sense Induction and Disambiguation Methods , 2013 .

[23]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[24]  David Jurgens,et al.  SemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses , 2013, SemEval@NAACL-HLT.

[25]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[26]  Suresh Manandhar,et al.  Word Sense Induction Disambiguation Using Hierarchical Random Graphs , 2010, EMNLP.

[27]  Andrew McCallum,et al.  Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space , 2014, EMNLP.

[28]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[29]  Ioannis Korkontzelos,et al.  UoY: Graphs of Unambiguous Vertices for Word Sense Induction and Disambiguation , 2010, SemEval@ACL.

[30]  Deyi Xiong,et al.  Adapted competitive learning on continuous semantic space for word sense induction , 2016, Neurocomputing.

[31]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[32]  Ivan Titov,et al.  Improved Estimation of Entropy for Evaluation of Word Sense Induction , 2014, CL.

[33]  Suresh Manandhar,et al.  Dependency Based Embeddings for Sentence Classification Tasks , 2016, NAACL.

[34]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.