Dependent Scalar Quantization For Neural Network Compression

Recent approaches to compression of deep neural networks, like the emerging standard on compression of neural networks for multimedia content description and analysis (MPEG-7 part 17), apply scalar quantization and entropy coding of the quantization indexes. In this paper we present an advanced method for quantization of neural network parameters, which applies dependent scalar quantization (DQ) or trellis-coded quantization (TCQ), and an improved context modeling for the entropy coding of the quantization indexes. We show that the proposed method achieves 5.778% bitrate reduction and virtually no loss (0.37%) of network performance in average, compared to the baseline methods of the second test model (NCTM) of MPEG-7 part 17 for relevant working points.

[1]  Michael W. Marcellin,et al.  JPEG2000 - image compression fundamentals, standards and practice , 2013, The Kluwer international series in engineering and computer science.

[2]  Jr. G. Forney,et al.  Viterbi Algorithm , 1973, Encyclopedia of Machine Learning.

[3]  Jukka Teuhola,et al.  A Compression Method for Clustered Bit-Vectors , 1978, Inf. Process. Lett..

[4]  Thomas R. Fischer,et al.  Image subband coding using arithmetic coded trellis coded quantization , 1995, IEEE Trans. Circuits Syst. Video Technol..

[5]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[6]  Michael W. Marcellin,et al.  Universal trellis coded quantization , 1999, IEEE Trans. Image Process..

[7]  K. R. Rao,et al.  High efficiency video coding , 2016, 2016 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA).

[8]  Heiko Schwarz,et al.  Hybrid Video Coding with Trellis-Coded Quantization , 2019, 2019 Data Compression Conference (DCC).

[9]  Heiko Schwarz,et al.  DeepCABAC: A Universal Compression Algorithm for Deep Neural Networks , 2019, IEEE Journal of Selected Topics in Signal Processing.

[10]  Heiko Schwarz,et al.  Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[11]  Klaus-Robert Müller,et al.  Robust and Communication-Efficient Federated Learning From Non-i.i.d. Data , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Mark Horowitz,et al.  1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[13]  Mathias Wien,et al.  High Efficiency Video Coding: Coding Tools and Specification , 2014 .

[14]  Jason Cong,et al.  Scaling for edge inference of deep neural networks , 2018 .

[15]  Michael W. Marcellin,et al.  Trellis coded quantization of memoryless and Gauss-Markov sources , 1990, IEEE Trans. Commun..

[16]  Min Wang,et al.  Entropy-constrained trellis coded quantization , 1991, [1991] Proceedings. Data Compression Conference.