Near-lossless Binarization of Word Embeddings

Word embeddings are commonly used as a starting point in many NLP models to achieve state-of-the-art performances. However, with a large vocabulary and many dimensions, these floating-point representations are expensive both in terms of memory and calculations which makes them unsuitable for use on low-resource devices. The method proposed in this paper transforms real-valued embeddings into binary embeddings while preserving semantic information, requiring only 128 or 256 bits for each vector. This leads to a small memory footprint and fast vector operations. The model is based on an autoencoder architecture, which also allows to reconstruct original vectors from the binary ones. Experimental results on semantic similarity, text classification and sentiment analysis tasks show that the binarization of word embeddings only leads to a loss of ~2% in accuracy while vector size is reduced by 97%. Furthermore, a top-k benchmark demonstrates that using these binary vectors is 30 times faster than using real-valued vectors.

[1]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[2]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[3]  Xiaoyan Zhu,et al.  Linguistically Regularized LSTM for Sentiment Classification , 2016, ACL.

[4]  Matthijs Douze,et al.  FastText.zip: Compressing text classification models , 2016, ArXiv.

[5]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[6]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[7]  Elia Bruni,et al.  Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..

[8]  Yizhou Sun,et al.  Learning K-way D-dimensional Discrete Code For Compact Embedding Representations , 2017, ICML.

[9]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[10]  Felix Hill,et al.  SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity , 2016, EMNLP.

[11]  Hideki Nakayama,et al.  Compressing Word Embeddings via Deep Compositional Code Learning , 2017, ICLR.

[12]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[13]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[14]  Julien Subercaze,et al.  On metric embedding for boosting semantic similarity computations , 2015, ACL.

[15]  David J. Fleet,et al.  Fast search in Hamming space with multi-index hashing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[17]  Jason Weston,et al.  StarSpace: Embed All The Things! , 2017, AAAI.

[18]  Vikas Raunak,et al.  Effective Dimensionality Reduction for Word Embeddings , 2017, RepL4NLP@ACL.

[19]  Dan Roth,et al.  Word Embeddings with Limited Memory , 2016, ACL.

[20]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[21]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[22]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[23]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[24]  Peng Wang,et al.  Convolutional Neural Networks for Text Hashing , 2015, IJCAI.

[25]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[26]  Sanjeev Arora,et al.  A Latent Variable Model Approach to PMI-based Word Embeddings , 2015, TACL.

[27]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[28]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[29]  Guoyin Wang,et al.  NASH: Toward End-to-End Neural Architecture for Generative Semantic Hashing , 2018, ACL.

[30]  Yulia Tsvetkov,et al.  Sparse Overcomplete Word Vector Representations , 2015, ACL.

[31]  Heng Tao Shen,et al.  Hashing for Similarity Search: A Survey , 2014, ArXiv.

[32]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[33]  Christophe Gravier,et al.  Dict2vec : Learning Word Embeddings using Lexical Dictionaries , 2017, EMNLP.

[34]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.