Approximating Word Ranking and Negative Sampling for Word Embedding

CBOW (Continuous Bag-Of-Words) is one of the most commonly used techniques to generate word embeddings in various NLP tasks. However, it fails to reach the optimal performance due to uniform involvements of positive words and a simple sampling distribution of negative words. To resolve these issues, we propose OptRank to optimize word ranking and approximate negative sampling for bettering word embedding. Specifically, we first formalize word embedding as a ranking problem. Then, we weigh the positive words by their ranks such that highly ranked words have more importance, and adopt a dynamic sampling strategy to select informative negative words. In addition, an approximation method is designed to efficiently compute word ranks. Empirical experiments show that OptRank consistently outperforms its counterparts on a benchmark dataset with different sampling scales, especially when the sampled subset is small. The code and datasets can be obtained from https : //github.com/ouououououou/OptRank

[1]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[2]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[3]  Gemma Boleda,et al.  Distributional Semantics in Technicolor , 2012, ACL.

[4]  Weinan Zhang,et al.  LambdaFM: Learning Optimal Ranking with Factorization Machines Using Lambda Surrogates , 2016, CIKM.

[5]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[6]  Guido Sanguinetti,et al.  Advances in Neural Information Processing Systems 24 , 2011 .

[7]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[8]  Jason Weston,et al.  Large scale image annotation: learning to rank with joint word-image embeddings , 2010, Machine Learning.

[9]  C. Parish A word. , 2000, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[10]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[11]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12]  S. V. N. Vishwanathan,et al.  WordRank: Learning Word Embeddings via Robust Ranking , 2015, EMNLP.

[13]  Thomas L. Griffiths,et al.  Advances in Neural Information Processing Systems 21 , 1993, NIPS 2009.

[14]  Evgeniy Gabrilovich,et al.  A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.

[15]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.