Learning Topic-Sensitive Word Representations

Distributed word representations are widely used for modeling words in NLP tasks. Most of the existing models generate one representation per word and do not consider different meanings of a word. We present two approaches to learn multiple topic-sensitive representations per word by using Hierarchical Dirichlet Process. We observe that by modeling topics and integrating topic distributions for each document we obtain representations that are able to distinguish between different meanings of a given word. Our models yield statistically significant improvements for the lexical substitution task indicating that commonly used single word representations, even when combined with contextual information, are insufficient for this task.

[1]  Kewei Tu,et al.  Context-Dependent Sense Embedding , 2016, EMNLP.

[2]  Paul Cook,et al.  A Word Embedding Approach to Identifying Verb-Noun Idiomatic Combinations , 2016, MWE@ACL.

[3]  Yulia Tsvetkov,et al.  Problems With Evaluation of Word Embeddings Using Word Similarity Tasks , 2016, RepEval@ACL.

[4]  Daniel Jurafsky,et al.  Do Multi-Sense Embeddings Improve Natural Language Understanding? , 2015, EMNLP.

[5]  Omer Levy,et al.  A Simple Word Embedding Model for Lexical Substitution , 2015, VS@HLT-NAACL.

[6]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[7]  Andrew McCallum,et al.  Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space , 2014, EMNLP.

[8]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[9]  Timothy Baldwin,et al.  Learning Word Sense Distributions, Detecting Unattested Senses and Identifying Novel Senses Using Topic Models , 2014, ACL.

[10]  Ming Zhou,et al.  Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification , 2014, ACL.

[11]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[12]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[13]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[14]  Eyke Hüllermeier,et al.  Learning to Rank Lexical Substitutions , 2013, EMNLP.

[15]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[16]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[17]  Xuchen Yao,et al.  Nonparametric Bayesian Word Sense Induction , 2011, Graph-based Methods for Natural Language Processing.

[18]  Simone Paolo Ponzetto,et al.  BabelNet: Building a Very Large Multilingual Semantic Network , 2010, ACL.

[19]  Raymond J. Mooney,et al.  Multi-Prototype Vector-Space Models of Word Meaning , 2010, NAACL.

[20]  Mirella Lapata,et al.  Bayesian Word Sense Induction , 2009, EACL.

[21]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[22]  Kazuaki Kishida Property of average precision and its generalization: An examination of evaluation indicator for information retrieval experiments , 2005 .

[23]  Yee Whye Teh,et al.  Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes , 2004, NIPS.

[24]  Adam Kilgarriff,et al.  "I Don’t Believe in Word Senses" , 1997, Comput. Humanit..

[25]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[26]  Stefan Thater,et al.  What Substitutes Tell Us - Analysis of an “All-Words” Lexical Substitution Corpus , 2014, EACL.

[27]  Stefan Katzenbeisser,et al.  SOFSEM 2012: Theory and Practice of Computer Science , 2012, Lecture Notes in Computer Science.

[28]  Eugenia Stoimenova,et al.  Applied Nonparametric Statistical Methods , 2010 .

[29]  Diana McCarthy,et al.  SemEval-2007 Task 10: English Lexical Substitution Task , 2007, *SEMEVAL.