论文信息 - Improving Word Representations via Global Context and Multiple Word Prototypes

Improving Word Representations via Global Context and Multiple Word Prototypes

Unsupervised word representations are very useful in NLP tasks both as inputs to learning algorithms and as extra word features in NLP systems. However, most of these models are built with only local context and one representation per word. This is problematic because words are often polysemous and global context can also provide useful information for learning word meanings. We present a new neural network architecture which 1) learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and 2) accounts for homonymy and polysemy by learning multiple embeddings per word. We introduce a new dataset with human judgments on pairs of words in sentential context, and evaluate our model on it, showing that our model outperforms competitive baselines and other neural language models.

[1] John B. Goodenough,et al. Contextual correlates of synonymy , 1965, CACM.

[2] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[3] G. Miller,et al. Contextual correlates of semantic similarity , 1991 .

[4] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[5] David J. Hess,et al. Effects of global and local context on lexical processing during language comprehension , 1995 .

[6] Hwee Tou Ng,et al. Corpus-Based Approaches to Semantic Interpretation in Natural Language Processing , 1997 .

[7] Hinrich Schütze,et al. Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[8] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[9] Ping Li,et al. The Acquisition of Word Meaning through Global Lexical Co-occurrences , 2000 .

[10] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.

[11] Y. Rosseel. Mixture models of categorization , 2002 .

[12] Ehud Rivlin,et al. Placing search in context: the concept revisited , 2002, TOIS.

[13] Jean-Luc Gauvain,et al. Connectionist language modeling for large vocabulary continuous speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14] Ahmad Emami,et al. Using a connectionist model in a syntactical based language model , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[15] Jimmy J. Lin,et al. Quantitative evaluation of passage retrieval algorithms for question answering , 2003, SIGIR.

[16] Inderjit S. Dhillon,et al. Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[17] James Richard Curran,et al. From distributional to semantic similarity , 2004 .

[18] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .

[19] Adam N. Sanborn,et al. Unifying rational models of categorization via the hierarchical Dirichlet process , 2019 .

[20] Evgeniy Gabrilovich,et al. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[21] Geoffrey E. Hinton,et al. Three new graphical models for statistical language modelling , 2007, ICML '07.

[22] Brendan T. O'Connor,et al. Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[23] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[24] Geoffrey E. Hinton,et al. A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[25] Mirella Lapata,et al. Vector-based Models of Semantic Composition , 2008, ACL.

[26] Katrin Erk,et al. A Structured Vector Space Model for Word Meaning in Context , 2008, EMNLP.