Streaming Word Embeddings with the Space-Saving Algorithm

We develop a streaming (one-pass, bounded-memory) word embedding algorithm based on the canonical skip-gram with negative sampling algorithm implemented in word2vec. We compare our streaming algorithm to word2vec empirically by measuring the cosine similarity between word pairs under each algorithm and by applying each algorithm in the downstream task of hashtag prediction on a two-month interval of the Twitter sample stream. We then discuss the results of these experiments, concluding they provide partial validation of our approach as a streaming replacement for word2vec. Finally, we discuss potential failure modes and suggest directions for future work.

[1]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[2]  Magnus Sahlgren,et al.  The Distributional Hypothesis , 2008 .

[3]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[4]  Marco Marelli,et al.  Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics , 2013, ACL.

[5]  Christopher D. Manning,et al.  Effect of Non-linear Deep Architecture in Sequence Labeling , 2013, IJCNLP.

[6]  Jason Weston,et al.  #TagSpace: Semantic Embeddings from Hashtags , 2014, EMNLP.

[7]  Divyakant Agrawal,et al.  Efficient Computation of Frequent and Top-k Elements in Data Streams , 2005, ICDT.

[8]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[9]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[10]  Xuanjing Huang,et al.  Automatic Hashtag Recommendation for Microblogs using Topic-Specific Translation Model , 2012, COLING.

[11]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[12]  Marios Hadjieleftheriou,et al.  Methods for finding frequent items in data streams , 2010, The VLDB Journal.

[13]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[14]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[15]  Katrin Kirchhoff,et al.  Factored Neural Language Models , 2006, NAACL.

[16]  Phil Blunsom,et al.  Compositional Morphology for Word Representations and Language Modelling , 2014, ICML.

[17]  Nobuhiro Kaji,et al.  Incremental Skip-gram Model with Negative Sampling , 2017, EMNLP.

[18]  Bowen Zhou,et al.  Improved Neural Network-based Multi-label Classification with Better Initialization Leveraging Label Co-occurrence , 2016, NAACL.

[19]  Wesley De Neve,et al.  Using topic models for Twitter hashtag recommendation , 2013, WWW.

[20]  Johannes Fürnkranz,et al.  Large-Scale Multi-label Text Classification - Revisiting Neural Networks , 2013, ECML/PKDD.

[21]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[22]  Radu Soricut,et al.  Unsupervised Morphology Induction Using Word Embeddings , 2015, NAACL.

[23]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[24]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[25]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[26]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[27]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[28]  Jayadev Misra,et al.  Finding Repeated Elements , 1982, Sci. Comput. Program..