Incremental Skip-gram Model with Negative Sampling

This paper explores an incremental training strategy for the skip-gram model with negative sampling (SGNS) from both empirical and theoretical perspectives. Existing methods of neural word embeddings, including SGNS, are multi-pass algorithms and thus cannot perform incremental model update. To address this problem, we present a simple incremental extension of SGNS and provide a thorough theoretical analysis to demonstrate its validity. Empirical experiments demonstrated the correctness of the theoretical analysis as well as the practical usefulness of the incremental algorithm.

[1]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[2]  Elia Bruni,et al.  Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[3]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[4]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[5]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[6]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[7]  Hal Daumé,et al.  Approximate Scalable Bounded Space Sketch for Large Data NLP , 2011, EMNLP.

[8]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[9]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[10]  Wenpeng Yin,et al.  Online Updating of Word Representations for Part-of-Speech Tagging , 2015, EMNLP.

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  Kevin Duh,et al.  Streaming Word Embeddings with the Space-Saving Algorithm , 2017, ArXiv.

[13]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[14]  Jianxin Li,et al.  Incrementally Learning the Hierarchical Softmax Function for Neural Language Models , 2017, AAAI.

[15]  Pavlos S. Efraimidis,et al.  Weighted Random Sampling over Data Streams , 2010, Algorithms, Probability, Networks, and Games.

[16]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[17]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[18]  Jayadev Misra,et al.  Finding Repeated Elements , 1982, Sci. Comput. Program..

[19]  Alessandro Lenci,et al.  Distributional Memory: A General Framework for Corpus-Based Semantics , 2010, CL.