论文信息 - GNEG: Graph-Based Negative Sampling for word2vec

GNEG: Graph-Based Negative Sampling for word2vec

Negative sampling is an important component in word2vec for distributed word representation learning. We hypothesize that taking into account global, corpus-level information and generating a different noise distribution for each target word better satisfies the requirements of negative examples for each training word than the original frequency-based distribution. In this purpose we pre-compute word co-occurrence statistics from the corpus and apply to it network algorithms such as random walk. We test this hypothesis through a set of experiments whose results show that our approach boosts the word analogy task by about 5% and improves the performance on word similarity tasks by about 1% compared to the skip-gram negative sampling baseline.

Pierre Zweigenbaum | Zheng Zhang | Pierre Zweigenbaum | Zheng Zhang

[1] Aapo Hyvärinen,et al. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[2] Felix Hill,et al. SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[3] Yee Whye Teh,et al. A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[4] Alexandre Allauzen,et al. An experimental analysis of Noise-Contrastive Estimation: the noise distribution matters , 2017, EACL.

[5] Yoshua Bengio,et al. On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[6] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[7] Alexander M. Rush,et al. Character-Aware Neural Language Models , 2015, AAAI.

[8] Alexander S. Yeh,et al. More accurate tests for the statistical significance of result differences , 2000, COLING.

[9] Yoshua Bengio,et al. Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[10] Ramon Ferrer i Cancho,et al. The small world of human language , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[11] Blockin Blockin,et al. Quick Training of Probabilistic Neural Nets by Importance Sampling , 2003 .

[12] Dragomir R. Radev,et al. Book Review: Graph-Based Natural Language Processing and Information Retrieval by Rada Mihalcea and Dragomir Radev , 2011, CL.

[13] Wenlin Chen,et al. Strategies for Training Large Vocabulary Neural Language Models , 2015, ACL.

[14] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[15] David J. Aldous,et al. Lower bounds for covering times for reversible Markov chains and random walks on graphs , 1989 .

[16] Ehud Rivlin,et al. Placing search in context: the concept revisited , 2002, TOIS.

[17] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[18] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[19] Pierre Zweigenbaum,et al. Efficient Generation and Processing of Word Co-occurrence Networks Using corpus2graph , 2018, TextGraphs@NAACL-HLT.

[20] J. H. Steiger. Tests for comparing elements of a correlation matrix. , 1980 .