Continuous Word Embedding Fusion via Spectral Decomposition

Word embeddings have become a mainstream tool in statistical natural language processing. Practitioners often use pre-trained word vectors, which were trained on large generic text corpora, and which are readily available on the web. However, pre-trained word vectors oftentimes lack important words from specific domains. It is therefore often desirable to extend the vocabulary and embed new words into a set of pre-trained word vectors. In this paper, we present an efficient method for including new words from a specialized corpus, containing new words, into pre-trained generic word embeddings. We build on the established view of word embeddings as matrix factorizations to present a spectral algorithm for this task. Experiments on several domain-specific corpora with specialized vocabularies demonstrate that our method is able to embed the new words efficiently into the original embedding space. Compared to competing methods, our method is faster, parameter-free, and deterministic.

[1]  Yongmin Li,et al.  On incremental and robust subspace learning , 2004, Pattern Recognit..

[2]  Ohad Shamir,et al.  A Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate , 2014, ICML.

[3]  Zhiyuan Liu,et al.  Online Learning of Interpretable Word Embeddings , 2015, EMNLP.

[4]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[5]  Lance De Vine,et al.  A Study on the Use of Word Embeddings and PageRank for Vietnamese Text Summarization , 2015, ADCS.

[6]  David M. Blei,et al.  Exponential Family Embeddings , 2016, NIPS.

[7]  Steffen Staab,et al.  A Generalized Language Model as the Combination of Skipped n-grams and Modified Kneser Ney Smoothing , 2014, ACL.

[8]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[9]  G. Karypis,et al.  Incremental Singular Value Decomposition Algorithms for Highly Scalable Recommender Systems , 2002 .

[10]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[11]  Dianhai Yu,et al.  Improve Statistical Machine Translation with Context-Sensitive Bilingual Semantic Embedding Model , 2014, EMNLP.

[12]  Yuanzhi Li,et al.  Even Faster SVD Decomposition Yet Without Agonizing Pain , 2016, NIPS.

[13]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[14]  Stephan Mandt,et al.  Dynamic Word Embeddings , 2017, ICML.

[15]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[16]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[17]  Matthew Brand,et al.  Fast Online SVD Revisions for Lightweight Recommender Systems , 2003, SDM.

[18]  Leonid Sigal,et al.  A Unified Semantic Embedding: Relating Taxonomies and Attributes , 2014, NIPS.

[19]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[21]  H. Robbins A Stochastic Approximation Method , 1951 .

[22]  Xuanjing Huang,et al.  Learning Context-Sensitive Word Embeddings with Neural Tensor Skip-Gram Model , 2015, IJCAI.

[23]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[24]  M. de Rijke,et al.  Short Text Similarity with Word Embeddings , 2015, CIKM.

[25]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[26]  B. Datta Numerical Linear Algebra and Applications , 1995 .