A Sense-Topic Model for Word Sense Induction with Unsupervised Data Enrichment

Word sense induction (WSI) seeks to automatically discover the senses of a word in a corpus via unsupervised methods. We propose a sense-topic model for WSI, which treats sense and topic as two separate latent variables to be inferred jointly. Topics are informed by the entire document, while senses are informed by the local context surrounding the ambiguous word. We also discuss unsupervised ways of enriching the original corpus in order to improve model performance, including using neural word embeddings and external corpora to expand the context of each data instance. We demonstrate significant improvements over the previous state-of-the-art, achieving the best results reported to date on the SemEval-2013 WSI task.

[1]  Jean Véronis,et al.  HyperLex: lexical cartography for information retrieval , 2004, Comput. Speech Lang..

[2]  Xuchen Yao,et al.  Nonparametric Bayesian Word Sense Induction , 2011, Graph-based Methods for Natural Language Processing.

[3]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[4]  William D. Lewis,et al.  Intelligent Selection of Language Model Training Data , 2010, ACL.

[5]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[6]  Nathanael Chambers,et al.  Template-Based Information Extraction without the Templates , 2011, ACL.

[7]  Dominic Widdows,et al.  Discovering Corpus-Specific Word Senses , 2003, EACL.

[8]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[9]  Xiaojin Zhu,et al.  A Topic Model for Word Sense Disambiguation , 2007, EMNLP.

[10]  Elie Bienenstock,et al.  Sphere Embedding: An Application to Part-of-Speech Induction , 2010, NIPS.

[11]  Geoffrey E. Hinton,et al.  Three new graphical models for statistical language modelling , 2007, ICML '07.

[12]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[13]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[14]  Ted Pedersen,et al.  Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces , 2004, CoNLL.

[15]  Eneko Agirre,et al.  Semeval-2007 Task 2 : Evaluating Word Sense Induction and Discrimination , 2007 .

[16]  Mirella Lapata,et al.  Bayesian Word Sense Induction , 2009, EACL.

[17]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[18]  Kevin Gimpel,et al.  Tailoring Continuous Word Representations for Dependency Parsing , 2014, ACL.

[19]  Yuji Matsumoto,et al.  An Empirical Investigation of Word Representations for Parsing the Web , 2013 .

[20]  Dean P. Foster,et al.  Two Step CCA: A new spectral method for estimating vector models of words , 2012, ICML 2012.

[21]  Katrin Erk,et al.  Graded Word Sense Assignment , 2009, EMNLP.

[22]  Nancy Ide,et al.  The American National Corpus First Release , 2004, LREC.

[23]  David Jurgens,et al.  Embracing Ambiguity: A Comparison of Annotation Methodologies for Crowdsourcing Word Sense Labels , 2013, NAACL.

[24]  Daphne Koller,et al.  Word-Sense Disambiguation for Machine Translation , 2005, HLT.

[25]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[26]  Mark Johnson,et al.  A Bayesian LDA-based model for semi-supervised part-of-speech tagging , 2007, NIPS.

[27]  Marine Carpuat,et al.  Word Sense Disambiguation vs. Statistical Machine Translation , 2005, ACL.

[28]  Timothy Baldwin,et al.  Word Sense Induction for Novel Sense Detection , 2012, EACL.

[29]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[30]  Michael J. Paul,et al.  Cross-Cultural Analysis of Blogs and Forums with Mixed-Collection Topic Models , 2009, EMNLP.

[31]  Enis Sert,et al.  AI-KU: Using Substitute Vectors and Co-Occurrence Modeling For Word Sense Induction and Disambiguation , 2013, SemEval@NAACL-HLT.

[32]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[34]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[35]  Caroline Sporleder,et al.  Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection , 2010, ACL.

[36]  Magnus Sahlgren,et al.  The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces , 2006 .

[37]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[38]  Katrin Erk,et al.  Investigations on Word Senses and Word Usages , 2009, ACL.

[39]  Patrick Pantel,et al.  Discovering word senses from text , 2002, KDD.

[40]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[41]  Timothy Baldwin,et al.  unimelb: Topic Modelling-based Word Sense Induction , 2013, SemEval@NAACL-HLT.

[42]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[43]  Yee Whye Teh,et al.  Improving Word Sense Disambiguation Using Topic Features , 2007, EMNLP.

[44]  Nancy Ide,et al.  Word Sense Annotation of Polysemous Words by Multiple Annotators , 2010, LREC.

[45]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[46]  David Jurgens,et al.  SemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses , 2013, SemEval@NAACL-HLT.

[47]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[48]  Kevin Knight,et al.  Syntactic Re-Alignment Models for Machine Translation , 2007, EMNLP.

[49]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[50]  Rada Mihalcea,et al.  Utilizing Semantic Composition in Distributional Semantic Models for Word Sense Discrimination and Word Sense Disambiguation , 2012, 2012 IEEE Sixth International Conference on Semantic Computing.

[51]  Marine Carpuat,et al.  Improving Statistical Machine Translation Using Word Sense Disambiguation , 2007, EMNLP.

[52]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[53]  David M. Blei,et al.  PUTOP: Turning Predominant Senses into a Topic Model for Word Sense Disambiguation , 2007, SemEval@ACL.