论文信息 - Probabilistic FastText for Multi-Sense Word Embeddings

Probabilistic FastText for Multi-Sense Word Embeddings

We introduce Probabilistic FastText, a new model for word embeddings that can capture multiple word senses, sub-word structure, and uncertainty information. In particular, we represent each word with a Gaussian mixture density, where the mean of a mixture component is given by the sum of n-grams. This representation allows the model to share statistical strength across sub-word structures (e.g. Latin roots), producing accurate representations of rare, misspelt, or even unseen words. Moreover, each component of the mixture can capture a different word sense. Probabilistic FastText outperforms both FastText, which has no probabilistic model, and dictionary-level probabilistic embeddings, which do not incorporate subword structures, on several word-similarity benchmarks, including English RareWord and foreign language datasets. We also achieve state-of-art performance on benchmarks that measure ability to discern different meanings. Thus, the proposed model is the first to achieve multi-sense representations while having enriched semantics on rare words.

[1] Iryna Gurevych,et al. Using the Structure of a Conceptual Network in Computing Semantic Relatedness , 2005, IJCNLP.

[2] Andrew Y. Ng,et al. Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[3] Silvia Bernardini,et al. The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[4] Andrew Gordon Wilson,et al. Multimodal Word Distributions , 2017, ACL.

[5] G. Zipf,et al. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. , 1949 .

[6] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[7] Ehud Rivlin,et al. Placing search in context: the concept revisited , 2002, TOIS.

[8] David M. W. Powers,et al. Verb similarity on the taxonomy of WordNet , 2006 .

[9] Sanjeev Arora,et al. Linear Algebraic Structure of Word Senses, with Applications to Polysemy , 2016, TACL.

[10] C. Spearman. The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[11] Evgeniy Gabrilovich,et al. A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.

[12] Andrew McCallum,et al. Word Representations via Gaussian Embedding , 2014, ICLR.

[13] Deniz Yuret,et al. CharNER: Character-Level Named Entity Recognition , 2016, COLING.

[14] Samuel L. Smith,et al. Offline bilingual word vectors, orthogonal transformations and the inverted softmax , 2017, ICLR.

[15] G. Miller,et al. Contextual correlates of semantic similarity , 1991 .

[16] Tony Jebara,et al. Probability Product Kernels , 2004, J. Mach. Learn. Res..

[17] Alexander M. Rush,et al. Character-Aware Neural Language Models , 2015, AAAI.

[18] Felix Hill,et al. SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[19] Lukás Burget,et al. Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20] Elia Bruni,et al. Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[21] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[22] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[23] Christopher D. Manning,et al. Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[24] Jason Lee,et al. Fully Character-Level Neural Machine Translation without Explicit Segmentation , 2016, TACL.

[25] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.

[26] Zhihua Zhang,et al. An Efficient Character-Level Neural Machine Translation , 2016, ArXiv.

[27] Evgeniy Gabrilovich,et al. Large-scale learning of word relatedness with constraints , 2012, KDD.

[28] Andrew McCallum,et al. Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space , 2014, EMNLP.

[29] Roi Reichart,et al. Judgment Language Matters: Multilingual Vector Space Models for Judgment Language Aware Lexical Semantics , 2015, ArXiv.

[30] John B. Goodenough,et al. Contextual correlates of synonymy , 1965, CACM.

[31] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[32] Zhiyuan Liu,et al. A Unified Model for Word Sense Representation and Disambiguation , 2014, EMNLP.

[33] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[34] Enhong Chen,et al. A Probabilistic Model for Learning Multi-Prototype Word Embeddings , 2014, COLING.