Weightless: Lossy Weight Encoding For Deep Neural Network Compression

The large memory requirements of deep neural networks limit their deployment and adoption on many devices. Model compression methods effectively reduce the memory requirements of these models, usually through applying transformations such as weight pruning or quantization. In this paper, we present a novel scheme for lossy weight encoding co-designed with weight simplification techniques. The encoding is based on the Bloomier filter, a probabilistic data structure that can save space at the cost of introducing random errors. Leveraging the ability of neural networks to tolerate these imperfections and by re-training around the errors, the proposed technique, named Weightless, can compress weights by up to 496x without loss of model accuracy. This results in up to a 1.51x improvement over the state-of-the-art.

[1]  Jeff A. Bilmes,et al.  Training Compressed Fully-Connected Networks with a Density-Diversity Penalty , 2016, International Conference on Learning Representations.

[2]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[3]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[4]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[5]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[6]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[7]  Bernard Chazelle,et al.  The Bloomier filter: an efficient data structure for static support lookup tables , 2004, SODA '04.

[8]  Max Welling,et al.  Soft Weight-Sharing for Neural Network Compression , 2017, ICLR.

[9]  Tara N. Sainath,et al.  Structured Transforms for Small-Footprint Deep Learning , 2015, NIPS.

[10]  Erich Elsen,et al.  Exploring Sparsity in Recurrent Neural Networks , 2017, ICLR.

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[13]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[14]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[15]  Jungwon Lee,et al.  Towards the Limit of Network Quantization , 2016, ICLR.

[16]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[17]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[18]  Kumar Chellapilla,et al.  Bloomier Filters: A second look , 2008, ESA.

[19]  Bruno Baynat,et al.  Retouched bloom filters: allowing networked applications to trade off selected false positives against false negatives , 2006, CoNEXT '06.

[20]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[21]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[22]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[23]  Michael Mitzenmacher,et al.  Compressed bloom filters , 2002, TNET.

[24]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[25]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.