Learning Sparse & Ternary Neural Networks with Entropy-Constrained Trained Ternarization (EC2T)

Deep neural networks (DNN) have shown remarkable success in a variety of machine learning applications. The capacity of these models (i.e., number of parameters), endows them with expressive power and allows them to reach the desired performance. In recent years, there is an increasing interest in deploying DNNs to resource- constrained devices (i.e., mobile devices) with limited energy, memory, and computational budget. To address this problem, we propose Entropy-Constrained Trained Ternarization (EC2T), a general framework to create sparse and ternary neural networks which are efficient in terms of storage (e.g., at most two binary-masks and two full-precision values are required to save a weight matrix) and computation (e.g., MAC operations are reduced to a few accumulations plus two multiplications). This approach consists of two steps. First, a super-network is created by scaling the dimensions of a pre-trained model (i.e., its width and depth). Subsequently, this super-network is simultaneously pruned (using an entropy constraint) and quantized (that is, ternary values are assigned layer-wise) in a training process, resulting in a sparse and ternary network representation. We validate the proposed approach in CIFAR-10, CIFAR-100, and ImageNet datasets, showing its effectiveness in image classification tasks.

[1]  Jose Javier Gonzalez Ortiz,et al.  What is the State of Neural Network Pruning? , 2020, MLSys.

[2]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Jungwon Lee,et al.  Towards the Limit of Network Quantization , 2016, ICLR.

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Klaus-Robert Müller,et al.  Compact and Computationally Efficient Representation of Deep Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[7]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[8]  Yuhui Xu,et al.  Deep Neural Network Compression with Single and Multiple Level Quantization , 2018, AAAI.

[9]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[10]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Shenghuo Zhu,et al.  Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM , 2017, AAAI.

[13]  Junmo Kim,et al.  Deep Pyramidal Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Klaus-Robert Müller,et al.  Entropy-Constrained Training of Deep Neural Networks , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[15]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[16]  Kilian Q. Weinberger,et al.  CondenseNet: An Efficient DenseNet Using Learned Group Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Bin Liu,et al.  Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  W. Marsden I and J , 2012 .

[20]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[21]  Klaus-Robert Müller,et al.  Robust and Communication-Efficient Federated Learning From Non-i.i.d. Data , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Deliang Fan,et al.  Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network Using Truncated Gaussian Approximation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[24]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[25]  Heiko Schwarz,et al.  DeepCABAC: A Universal Compression Algorithm for Deep Neural Networks , 2019, IEEE Journal of Selected Topics in Signal Processing.

[26]  Tao Zhang,et al.  Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges , 2018, IEEE Signal Processing Magazine.

[27]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[28]  Max Welling,et al.  Bayesian Compression for Deep Learning , 2017, NIPS.

[29]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[30]  Max Welling,et al.  Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.

[31]  Dah-Jye Lee,et al.  A Review of Binarized Neural Networks , 2019, Electronics.

[32]  Song Han,et al.  Trained Ternary Quantization , 2016, ICLR.

[33]  Gregory J. Wolff,et al.  Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.

[34]  Max Welling,et al.  Improved Bayesian Compression , 2017, 1711.06494.

[35]  Dmitry P. Vetrov,et al.  Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[36]  Erich Elsen,et al.  The State of Sparsity in Deep Neural Networks , 2019, ArXiv.

[37]  Yuandong Tian,et al.  One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers , 2019, NeurIPS.