Improving Word Representations via Global Context and Multiple Word Prototypes

Unsupervised word representations are very useful in NLP tasks both as inputs to learning algorithms and as extra word features in NLP systems. However, most of these models are built with only local context and one representation per word. This is problematic because words are often polysemous and global context can also provide useful information for learning word meanings. We present a new neural network architecture which 1) learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and 2) accounts for homonymy and polysemy by learning multiple embeddings per word. We introduce a new dataset with human judgments on pairs of words in sentential context, and evaluate our model on it, showing that our model outperforms competitive baselines and other neural language models.

[1]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[2]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[3]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[4]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[5]  David J. Hess,et al.  Effects of global and local context on lexical processing during language comprehension , 1995 .

[6]  Hwee Tou Ng,et al.  Corpus-Based Approaches to Semantic Interpretation in Natural Language Processing , 1997 .

[7]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[8]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[9]  Ping Li,et al.  The Acquisition of Word Meaning through Global Lexical Co-occurrences , 2000 .

[10]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[11]  Y. Rosseel Mixture models of categorization , 2002 .

[12]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[13]  Jean-Luc Gauvain,et al.  Connectionist language modeling for large vocabulary continuous speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Ahmad Emami,et al.  Using a connectionist model in a syntactical based language model , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[15]  Jimmy J. Lin,et al.  Quantitative evaluation of passage retrieval algorithms for question answering , 2003, SIGIR.

[16]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[17]  James Richard Curran,et al.  From distributional to semantic similarity , 2004 .

[18]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[19]  Adam N. Sanborn,et al.  Unifying rational models of categorization via the hierarchical Dirichlet process , 2019 .

[20]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[21]  Geoffrey E. Hinton,et al.  Three new graphical models for statistical language modelling , 2007, ICML '07.

[22]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[23]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[24]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[25]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[26]  Katrin Erk,et al.  A Structured Vector Space Model for Word Meaning in Context , 2008, EMNLP.

[27]  Mirella Lapata,et al.  Measuring Distributional Similarity in Context , 2010, EMNLP.

[28]  Suresh Manandhar,et al.  SemEval-2010 Task 14: Word Sense Induction &Disambiguation , 2010, SemEval@ACL.

[29]  Raymond J. Mooney,et al.  A Mixture Model with Sharing for Lexical Semantics , 2010, EMNLP.

[30]  Raymond J. Mooney,et al.  Multi-Prototype Vector-Space Models of Word Meaning , 2010, NAACL.

[31]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[32]  Dean P. Foster,et al.  Multi-View Learning of Word Embeddings via CCA , 2011, NIPS.

[33]  Stefan Thater,et al.  Word Meaning in Context: A Simple and Effective Vector Model , 2011, IJCNLP.

[34]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[35]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[36]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[37]  Suresh Manandhar,et al.  Dynamic and Static Prototype Vectors for Semantic Composition , 2011, IJCNLP.