论文信息 - Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations - 字舞流文

Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations

In this paper, we propose LexVec, a new method for generating distributed word representations that uses low-rank, weighted factorization of the Positive Point-wise Mutual Information matrix via stochastic gradient descent, employing a weighting scheme that assigns heavier penalties for errors on frequent co-occurrences while still accounting for negative co-occurrence. Evaluation on word similarity and analogy tasks shows that LexVec matches and often outperforms state-of-the-art methods on many of these tasks.

Marco Idiart | Alexandre Salle | Aline Villavicencio | M. Idiart | Alexandre Salle | A. Villavicencio

[1] C. Eckart,et al. The approximation of one matrix by another of lower rank , 1936 .

[2] John B. Goodenough,et al. Contextual correlates of synonymy , 1965, CACM.

[3] Kenneth Ward Church,et al. Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[4] G. Miller,et al. Contextual correlates of semantic similarity , 1991 .

[5] Dekang Lin,et al. Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[6] John Caron,et al. Experiments with LSA scoring: optimal rank and basis , 2001 .

[7] Ehud Rivlin,et al. Placing search in context: the concept revisited , 2002, TOIS.

[8] J. Bullinaria,et al. Extracting semantic representations from word co-occurrence statistics: A computational study , 2007, Behavior research methods.

[9] Yoshua Bengio,et al. Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[10] Evgeniy Gabrilovich,et al. A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.

[11] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[12] Andrew Y. Ng,et al. Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[13] John A Bullinaria,et al. Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD , 2012, Behavior Research Methods.

[14] Gemma Boleda,et al. Distributional Semantics in Technicolor , 2012, ACL.

[15] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[16] Christopher D. Manning,et al. Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[17] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[18] Andrew Y. Ng,et al. Parsing with Compositional Vector Grammars , 2013, ACL.

[19] Geoffrey Zweig,et al. Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[20] Georgiana Dinu,et al. Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[21] Omer Levy,et al. Linguistic Regularities in Sparse and Explicit Word Representations , 2014, CoNLL.

[22] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[23] Omer Levy,et al. Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[24] Magnus Sahlgren,et al. Factorization of Latent Variables in Distributional Semantic Models , 2015, EMNLP.

[25] Omer Levy,et al. Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[26] Felix Hill,et al. SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.