Bit-Regularized Optimization of Neural Nets

We present a novel optimization strategy for training neural networks which we call “BitNet”. The parameters of neural networks are usually unconstrained and have a dynamic range dispersed over all real values. Our key idea is to limit the expressive power of the network by dynamically controlling the range and set of values that the parameters can take. We formulate this idea using a novel end-to-end approach that circumvents the discrete parameter space by optimizing a relaxed continuous and differentiable upper bound of the typical classification loss function. The approach can be interpreted as a regularization inspired by the Minimum Description Length (MDL) principle. For each layer of the network, our approach optimizes real-valued translation and scaling factors and arbitrary precision integer-valued parameters (weights). We empirically compare BitNet to an equivalent unregularized model on the MNIST and CIFAR-10 datasets. We show that BitNet converges faster to a superior quality solution. Additionally, the resulting model has significant savings in memory due to the use of integer-valued parameters.

[1]  Shuicheng Yan,et al.  Training Skinny Deep Neural Networks with Iterative Hard Thresholding Methods , 2016, ArXiv.

[2]  Yu Cao,et al.  Reducing the Model Order of Deep Neural Networks Using Information Theory , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[3]  Song Han,et al.  Trained Ternary Quantization , 2016, ICLR.

[4]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[5]  Herbert Gish,et al.  Asymptotically efficient quantizing , 1968, IEEE Trans. Inf. Theory.

[6]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[7]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[8]  Dharmendra S. Modha,et al.  Deep neural networks are robust to weight binarization and other non-linear distortions , 2016, ArXiv.

[9]  Yoshua Bengio,et al.  Neural Networks with Few Multiplications , 2015, ICLR.

[10]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[11]  Vincent Lepetit,et al.  Learning Separable Filters , 2013, CVPR.

[12]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[13]  Eriko Nurvitadhi,et al.  Accelerating Deep Convolutional Networks using low-precision and sparsity , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Sherief Reda,et al.  Understanding the impact of precision quantization on the accuracy and energy of neural networks , 2016, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[15]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[16]  Xing Wang,et al.  Scalable Compression of Deep Neural Networks , 2016, ACM Multimedia.

[17]  Xuelong Li,et al.  Towards Convolutional Neural Networks Compression via Global Error Reconstruction , 2016, IJCAI.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[20]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[21]  Luca Benini,et al.  Soft-to-Hard Vector Quantization for End-to-End Learned Compression of Images and Neural Networks , 2017, ArXiv.

[22]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[23]  Xiaogang Wang,et al.  Convolutional neural networks with low-rank regularization , 2015, ICLR.

[24]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[25]  Natalie D. Enright Jerger,et al.  Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks , 2016, ICS.

[26]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[27]  Yixin Chen,et al.  Compressing Convolutional Neural Networks , 2015, ArXiv.

[28]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[29]  Hao Zhou,et al.  Less Is More: Towards Compact CNNs , 2016, ECCV.

[30]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[31]  Jack Xin,et al.  Training Ternary Neural Networks with Exact Proximal Operator , 2016, ArXiv.

[32]  Abhisek Kundu,et al.  Mixed Low-precision Deep Learning Inference using Dynamic Fixed Point , 2017, ArXiv.

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[35]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[36]  Christian Gagné,et al.  Alternating Direction Method of Multipliers for Sparse Convolutional Neural Networks , 2016, ArXiv.

[37]  Max Welling,et al.  Soft Weight-Sharing for Neural Network Compression , 2017, ICLR.

[38]  Ming Zhang,et al.  Two-Bit Networks for Deep Learning on Resource-Constrained Embedded Devices , 2017, ArXiv.

[39]  Jungwon Lee,et al.  Towards the Limit of Network Quantization , 2016, ICLR.

[40]  Wonyong Sung,et al.  Learning separable fixed-point kernels for deep convolutional neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[41]  Pushmeet Kohli,et al.  Memory Bounded Deep Convolutional Networks , 2014, ArXiv.

[42]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[43]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[44]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[45]  Wonyong Sung,et al.  Fixed-point optimization of deep neural networks with adaptive step size retraining , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).