Dynamic Word Embeddings via Skip-Gram Filtering

We present a probabilistic language model for time-stamped text data which tracks the semantic evolution of individual words over time. The model represents words and contexts by latent trajectories in an embedding space. At each moment in time, the embedding vectors are inferred from a probabilistic version of word2vec (Mikolov et al., 2013b). These embedding vectors are connected in time through a latent diffusion process. We describe two scalable variational inference algorithms—skipgram smoothing and skip-gram filtering—that allow us to train the model jointly over all times; thus learning on all data while simultaneously allowing word and context vectors to drift. Experimental results on three different corpora demonstrate that our dynamic model infers word embedding trajectories that are more interpretable and lead to higher predictive likelihoods than competing methods that are based on static models trained separately on time slices.

[1]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[2]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[3]  R. Mazo On the theory of brownian motion , 1973 .

[4]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[5]  Pantelimon Stanica,et al.  The inverse of banded matrices , 2013, J. Comput. Appl. Math..

[6]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[7]  Andrew McCallum,et al.  Word Representations via Gaussian Embedding , 2014, ICLR.

[8]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[9]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[10]  Param Vir Singh,et al.  A Hidden Markov Model for Collaborative Filtering , 2010, MIS Q..

[11]  Chong Wang,et al.  Continuous Time Dynamic Topic Models , 2008, UAI.

[12]  David M. Blei,et al.  Exponential Family Embeddings , 2016, NIPS.

[13]  Erez Lieberman Aiden,et al.  Quantitative Analysis of Culture Using Millions of Digitized Books , 2010, Science.

[14]  David M. Blei,et al.  Dynamic Poisson Factorization , 2015, RecSys.

[15]  Paulo E. Rauber,et al.  Visualizing Time-Dependent Data Using Dynamic t-SNE , 2016, EuroVis.

[16]  Yanwei Fu,et al.  Semi-supervised Vocabulary-Informed Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jure Leskovec,et al.  Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change , 2016, ACL.

[18]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[19]  Eyal Sagi,et al.  Tracing semantic change with latent semantic analysis , 2011 .

[20]  Rada Mihalcea,et al.  Word Epoch Disambiguation: Finding How Words Change Over Time , 2012, ACL.

[21]  Stephan Mandt,et al.  Dynamic Word Embeddings , 2017, ICML.

[22]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[23]  J. L. Roux An Introduction to the Kalman Filter , 2003 .

[24]  Arkadi Nemirovski,et al.  The Ordered Subsets Mirror Descent Optimization Method with Applications to Tomography , 2001, SIAM J. Optim..

[25]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[26]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[27]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[28]  Koray Kavukcuoglu,et al.  Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[29]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[30]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[31]  Oren Barkan,et al.  Bayesian Neural Word Embedding , 2016, AAAI.

[32]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[33]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[34]  Steven Skiena,et al.  Statistically Significant Detection of Linguistic Change , 2014, WWW.

[35]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[36]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[37]  Slav Petrov,et al.  Temporal Analysis of Language through Neural Language Models , 2014, LTCSS@ACL.

[38]  John W. Paisley,et al.  A Collaborative Kalman Filter for Time-Evolving Dyadic Processes , 2014, 2014 IEEE International Conference on Data Mining.

[39]  Adler J. Perotte,et al.  The Survival Filter: Joint Survival Analysis with a Latent Time Series , 2015, UAI.

[40]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.