Riemannian Optimization for Skip-Gram Negative Sampling

Skip-Gram Negative Sampling (SGNS) word embedding model, well known by its implementation in "word2vec" software, is usually optimized by stochastic gradient descent. However, the optimization of SGNS objective can be viewed as a problem of searching for a good matrix with the low-rank constraint. The most standard way to solve this type of problems is to apply Riemannian optimization framework to optimize the SGNS objective over the manifold of required low-rank matrices. In this paper, we propose an algorithm that optimizes SGNS objective using Riemannian optimization and demonstrates its superiority over popular competitors, such as the original method to train SGNS and SVD over SPPMI matrix.

[1]  Jun Zhao,et al.  How to Generate a Good Word Embedding , 2015, IEEE Intelligent Systems.

[2]  Bart Vandereycken,et al.  Low-rank tensor completion by Riemannian optimization , 2014 .

[3]  Pierre-Antoine Absil,et al.  Low-rank retractions: a survey and new results , 2015, Comput. Optim. Appl..

[4]  Bamdev Mishra,et al.  Fixed-rank matrix factorizations and Riemannian low-rank optimization , 2012, Comput. Stat..

[5]  Xin Rong,et al.  word2vec Parameter Learning Explained , 2014, ArXiv.

[6]  A. H. Bentbib,et al.  Block Power Method for SVD Decomposition , 2015 .

[7]  Thorsten Joachims,et al.  Evaluation methods for unsupervised word embeddings , 2015, EMNLP.

[8]  Ivor W. Tsang,et al.  Riemannian Pursuit for Big Matrix Recovery , 2014, ICML.

[9]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[10]  Tony F. Chan,et al.  Guarantees of Riemannian Optimization for Low Rank Matrix Recovery , 2015, SIAM J. Matrix Anal. Appl..

[11]  S. Sathiya Keerthi,et al.  Towards a Better Understanding of Predict and Count Models , 2015, ArXiv.

[12]  J. Zhu,et al.  On the degrees of freedom of reduced-rank estimators in multivariate regression. , 2012, Biometrika.

[13]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[14]  C. Lubich,et al.  A projector-splitting integrator for dynamical low-rank approximation , 2013, BIT Numerical Mathematics.

[15]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[16]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[17]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[18]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[19]  Bart Vandereycken,et al.  Low-Rank Matrix Completion by Riemannian Optimization , 2013, SIAM J. Optim..

[20]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[21]  C. Udriste,et al.  Convex Functions and Optimization Methods on Riemannian Manifolds , 1994 .

[22]  Othmar Koch,et al.  Dynamical Low-Rank Approximation , 2007, SIAM J. Matrix Anal. Appl..

[23]  Elia Bruni,et al.  Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..