Word Embedding Visualization Via Dictionary Learning

Co-occurrence statistics based word embedding techniques have proved to be very useful in extracting the semantic and syntactic representation of words as low dimensional continuous vectors. In this work, we discovered that dictionary learning can open up these word vectors as a linear combination of more elementary word factors. We demonstrate many of the learned factors have surprisingly strong semantic or syntactic meaning corresponding to the factors previously identified manually by human inspection. Thus dictionary learning provides a powerful visualization tool for understanding word embedding representations. Furthermore, we show that the word factors can help in identifying key semantic and syntactic differences in word analogy tasks and improve upon the state-of-the-art word embedding techniques in these tasks by a large margin.

[1]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[2]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[3]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[4]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[5]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[6]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[7]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[8]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[9]  Omer Levy,et al.  Linguistic Regularities in Sparse and Explicit Word Representations , 2014, CoNLL.

[10]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[11]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[12]  Tom M. Mitchell,et al.  Interpretable Semantic Vectors from a Joint Model of Brain- and Text- Based Meaning , 2014, ACL.

[13]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[14]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[15]  Yulia Tsvetkov,et al.  Sparse Overcomplete Word Vector Representations , 2015, ACL.

[16]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[17]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Timothy M. Hospedales,et al.  Analogies Explained: Towards Understanding Word Embeddings , 2019, ICML.

[20]  Tom M. Mitchell,et al.  Learning Effective and Interpretable Semantic Models using Non-Negative Sparse Embedding , 2012, COLING.

[21]  Valerio Pascucci,et al.  Visual Exploration of Semantic Relationships in Neural Word Embeddings , 2018, IEEE Transactions on Visualization and Computer Graphics.

[22]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[23]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[24]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[25]  Zhiyuan Liu,et al.  Improved Word Representation Learning with Sememes , 2017, ACL.

[26]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[27]  Kevin Gimpel,et al.  Towards Universal Paraphrastic Sentence Embeddings , 2015, ICLR.

[28]  Graeme Hirst,et al.  Towards Understanding Linear Word Analogies , 2018, ACL.

[29]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[30]  Zhiyuan Liu,et al.  Lexical Sememe Prediction via Word Embeddings and Matrix Factorization , 2017, IJCAI.

[31]  Bruno A. Olshausen,et al.  The Sparse Manifold Transform , 2018, NeurIPS.

[32]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[33]  Harsh Jhamtani,et al.  SPINE: SParse Interpretable Neural Embeddings , 2017, AAAI.