Modularized Multi-Stage Binarized Deep-Model Compression and Performance Recovery

Despite the approaching Internet of Things (IoT) era, most smart devices cannot directly execute complicated deep neural networks (DNNs) due to their limited memory sizes and computing power. Even with model compression, compressing and executing a non-binarized DNN model often needs a codebook for weight calculation, which typically requires acceleration from a Graphics Processing Unit (GPU). Parameter binarization is one commonly used method to reduce computing time and space, but it often seriously compromises the accuracy of the binarized DNN (BNN), which tends to adopt single-stage non-modularized design. To address these issues, we propose a modularized multi-stage BNN model compression with flexible combination of parameter and structure compression to minimize model size while maintaining maximum accuracy possible. The compressed BNNs do not need any codebooks so they better fit into embedded platforms without any GPU acceleration. From our evaluations, the size of a BNN model can be reduced to 1/57 size at the cost of only about a 5% decrease in accuracy, which outperforms the state-of-the-art single-stage compression on BNN models. Furthermore, the proposed method can recover the accuracy of a BNN by 5% even after its size has been further compressed to approximately half its size, demonstrating its potentials in leveraging edge intelligence.

[1]  Tarek F. Abdelzaher,et al.  DeepIoT: Compressing Deep Neural Network Structures for Sensing Systems with a Compressor-Critic Framework , 2017, SenSys.

[2]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[3]  Zhi Zhou,et al.  Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing , 2019, IEEE Transactions on Wireless Communications.

[4]  Avi Mendelson,et al.  Streaming Architecture for Large-Scale Quantized Neural Networks on an FPGA-Based Dataflow Platform , 2017, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[5]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[6]  Chao Wang,et al.  MALOC: A Fully Pipelined FPGA Accelerator for Convolutional Neural Networks With All Layers Mapped on Chip , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[7]  Woei-Jiunn Tsaur,et al.  DANS: A Secure and Efficient Driver-Abnormal Notification Scheme With IoT Devices Over IoV , 2019, IEEE Systems Journal.

[8]  Long Wang,et al.  A Novel Human Activity Recognition Scheme for Smart Health Using Multilayer Extreme Learning Machine , 2019, IEEE Internet of Things Journal.

[9]  Jason Cong,et al.  Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[10]  Bin Liu,et al.  Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[12]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[13]  Mário P. Véstias,et al.  A Survey of Convolutional Neural Networks on Edge with Reconfigurable Computing , 2019, Algorithms.

[14]  Soumya Kanti Datta,et al.  An edge computing architecture integrating virtual IoT devices , 2017, 2017 IEEE 6th Global Conference on Consumer Electronics (GCCE).

[15]  M. W Gardner,et al.  Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences , 1998 .

[16]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[17]  Heng-Yuan Lee,et al.  NV-BNN: An Accurate Deep Convolutional Neural Network Based on Binary STT-MRAM for Adaptive AI Edge , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[18]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[21]  Paramvir Bahl,et al.  Real-Time Video Analytics: The Killer App for Edge Computing , 2017, Computer.

[22]  Gang Tao,et al.  Efficient Edge Nodes Reconfiguration and Selection for the Internet of Things , 2019, IEEE Sensors Journal.

[23]  A. Krizhevsky Convolutional Deep Belief Networks on CIFAR-10 , 2010 .

[24]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[25]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[26]  Peter R. Kinget,et al.  Improving Pedestrian Safety in Cities Using Intelligent Wearable Systems , 2019, IEEE Internet of Things Journal.

[27]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[28]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[29]  Wei Zhang,et al.  An Experimental Study for Tracking Crowd in Smart Cities , 2019, IEEE Systems Journal.

[30]  Tao Zhang,et al.  Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges , 2018, IEEE Signal Processing Magazine.

[31]  Burak Kantarci,et al.  Large-Scale Distributed Dedicated- and Non-Dedicated Smart City Sensing Systems , 2017, IEEE Sensors Journal.

[32]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[33]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[35]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[36]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[37]  Teresa Riesgo,et al.  The Extreme Edge at the Bottom of the Internet of Things: A Review , 2019, IEEE Sensors Journal.