A Generative Word Embedding Model and its Low Rank Positive Semidefinite Solution

Most existing word embedding methods can be categorized into Neural Embedding Models and Matrix Factorization (MF)-based methods. However some models are opaque to probabilistic interpretation, and MF-based methods, typically solved using Singular Value Decomposition (SVD), may incur loss of corpus information. In addition, it is desirable to incorporate global latent factors, such as topics, sentiments or writing styles, into the word embedding model. Since generative models provide a principled way to incorporate latent factors, we propose a generative word embedding model, which is easy to interpret, and can serve as a basis of more sophisticated latent factor models. The model inference reduces to a low rank weighted positive semidefinite approximation problem. Its optimization is approached by eigendecomposition on a submatrix, followed by online blockwise regression, which is scalable and avoids the information loss in SVD. In experiments on 7 common benchmark datasets, our vectors are competitive to word2vec, and better than other MF-based methods.

[1]  Sanjeev Arora,et al.  Random Walks on Context Spaces: Towards an Explanation of the Mysteries of Semantic Word Embeddings , 2015, ArXiv.

[2]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[3]  Iryna Gurevych,et al.  Using Wiktionary for Computing Semantic Relatedness , 2008, AAAI.

[4]  Yulia Tsvetkov,et al.  Sparse Overcomplete Word Vector Representations , 2015, ACL.

[5]  Noah A. Smith,et al.  Learning Word Representations with Hierarchical Sparse Coding , 2014, ICML.

[6]  Rong Jin,et al.  Double Updating Online Learning , 2011, J. Mach. Learn. Res..

[7]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[8]  Yoshua Bengio,et al.  Neural Probabilistic Language Models , 2006 .

[9]  Gemma Boleda,et al.  Distributional Semantics in Technicolor , 2012, ACL.

[10]  Benjamin Flecker,et al.  Synergy, redundancy, and multivariate information measures: an experimentalist’s perspective , 2014, Journal of Computational Neuroscience.

[11]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[12]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[13]  N. Higham COMPUTING A NEAREST SYMMETRIC POSITIVE SEMIDEFINITE MATRIX , 1988 .

[14]  Evgeniy Gabrilovich,et al.  A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.

[15]  Ivor W. Tsang,et al.  Riemannian Pursuit for Big Matrix Recovery , 2014, ICML.

[16]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[17]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[18]  Omer Levy,et al.  Linguistic Regularities in Sparse and Explicit Word Representations , 2014, CoNLL.

[19]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[20]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[21]  Dean P. Foster,et al.  Multi-View Learning of Word Embeddings via CCA , 2011, NIPS.

[22]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[23]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[24]  Dean P. Foster,et al.  Eigenwords: spectral word embeddings , 2015, J. Mach. Learn. Res..

[25]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[26]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[27]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[28]  P. Holland Weighted Ridge Regression: Combining Ridge and Robust Regression Methods , 1973 .

[29]  Yi Yang,et al.  Scalable Maximum Margin Matrix Factorization by Active Riemannian Subspace Search , 2015, IJCAI.

[30]  Karl Stratos,et al.  A Spectral Algorithm for Learning Class-Based n-gram Models of Natural Language , 2014, UAI.

[31]  Thomas Mach,et al.  Eigenvalue Algorithms for Symmetric Hierarchical Matrices , 2012 .

[32]  Sanjeev Arora,et al.  Random walks on discourse spaces: a new generative language model with applications to semantic word embeddings , 2015 .

[33]  Geoffrey E. Hinton,et al.  Three new graphical models for statistical language modelling , 2007, ICML '07.

[34]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[35]  Yulan He,et al.  Joint sentiment/topic model for sentiment analysis , 2009, CIKM.

[36]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[37]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[38]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[39]  Randall D. Beer,et al.  Nonnegative Decomposition of Multivariate Information , 2010, ArXiv.

[40]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[41]  Gal Chechik,et al.  Euclidean Embedding of Co-occurrence Data , 2004, J. Mach. Learn. Res..

[42]  Karl Stratos,et al.  Model-based Word Embeddings from Decompositions of Count Matrices , 2015, ACL.

[43]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[44]  Dean Alderucci A SPECTRAL ALGORITHM FOR LEARNING HIDDEN MARKOV MODELS THAT HAVE SILENT STATES , 2015 .