Random walks on discourse spaces: a new generative language model with applications to semantic word embeddings

Semantic word embeddings use vector representations to represent the meaning of a word. Methods to create them include Vector Space Methods (VSMs) such as Latent Semantic Analysis (LSA), matrix factorization, generative text models such as Topic Models, and neural nets. A flurry of work has resulted from the papers of Mikolov et al.~\cite{mikolov2013efficient}. These showed how to solve word analogy tasks very well by leveraging linear structure in word embeddings even though the embeddings were created using highly nonlinear energy based models. No clear explanation is known why such linear structure emerges in low-dimensional embeddings. This paper presents a loglinear generative model---related to~\citet{mnih2007three}---that models the generation of a text corpus as a random walk in a latent discourse space. A novel methodological twist is that the model is solved in closed form by integrating out the random walk. This yields a simple method for constructing word embeddings. Experiments are presented to support the modeling assumptions as well as the efficacy of the word embeddings for solving analogies. This simple model links and provides theoretical support for several prior methods for finding embeddings, as well as provides interpretations for various linear algebraic structures in word embeddings obtained from nonlinear techniques.

[1]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[2]  Yoshua Bengio,et al.  Neural Probabilistic Language Models , 2006 .

[3]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[5]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[6]  Sanjeev Arora,et al.  A Latent Variable Model Approach to PMI-based Word Embeddings , 2015, TACL.

[7]  Dan Klein,et al.  When and why are log-linear models self-normalizing? , 2015, NAACL.

[8]  Karl Stratos,et al.  Spectral Learning of Latent-Variable PCFGs , 2012, ACL.

[9]  Richard A. Harshman,et al.  Indexing by latent semantic indexing analysis , 1990 .

[10]  Geoffrey E. Hinton,et al.  Three new graphical models for statistical language modelling , 2007, ICML '07.

[11]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[12]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[13]  Gal Chechik,et al.  Euclidean Embedding of Co-occurrence Data , 2004, J. Mach. Learn. Res..

[14]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[15]  David B. Dunson,et al.  Probabilistic topic models , 2012, Commun. ACM.

[16]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[17]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[18]  F. Black,et al.  The Pricing of Options and Corporate Liabilities , 1973, Journal of Political Economy.

[19]  Omer Levy,et al.  Linguistic Regularities in Sparse and Explicit Word Representations , 2014, CoNLL.

[20]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[21]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[22]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[23]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[24]  Tommi S. Jaakkola,et al.  Word Embeddings as Metric Recovery in Semantic Spaces , 2016, TACL.

[25]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[26]  Sham M. Kakade,et al.  A Linear Dynamical System Model for Text , 2015, ICML.

[27]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[28]  Elie Bienenstock,et al.  Sphere Embedding: An Application to Part-of-Speech Induction , 2010, NIPS.