Differentiable Product Quantization for End-to-End Embedding Compression

Embedding layers are commonly used to map discrete symbols into continuous embedding vectors that reflect their semantic meanings. Despite their effectiveness, the number of parameters in an embedding layer increases linearly with the number of symbols and poses a critical challenge on memory and storage constraints. In this work, we propose a generic and end-to-end learnable compression framework termed differentiable product quantization (DPQ). We present two instantiations of DPQ that leverage different approximation techniques to enable differentiability in end-to-end learning. Our method can readily serve as a drop-in alternative for any existing embedding layer. Empirically, DPQ offers significant compression ratios (14-238$\times$) at negligible or no performance cost on 10 datasets across three different language tasks.

[1]  Aurko Roy,et al.  Fast Decoding in Sequence Models using Discrete Latent Variables , 2018, ICML.

[2]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[3]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Yizhou Sun,et al.  Embedding for Personalized Content-based Recommendation , 2017 .

[5]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[6]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[7]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[8]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[9]  Yoram Singer,et al.  Memory-Efficient Adaptive Optimization for Large-Scale Learning , 2019, ArXiv.

[10]  Yizhou Sun,et al.  Learning K-way D-dimensional Discrete Code For Compact Embedding Representations , 2017, ICML.

[11]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[12]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[13]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[14]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[15]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[16]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[17]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[18]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[19]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[20]  Tri Dao,et al.  On the Downstream Performance of Compressed Word Embeddings , 2019, NeurIPS.

[21]  Hideki Nakayama,et al.  Compressing Word Embeddings via Deep Compositional Code Learning , 2017, ICLR.

[22]  Tian Lin,et al.  Adaptive Mixture of Low-Rank Factorizations for Compact Neural Modeling , 2018 .

[23]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[24]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[25]  David J. Fleet,et al.  Cartesian K-Means , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.