论文信息 - Modularized Multi-Stage Binarized Deep-Model Compression and Performance Recovery

Modularized Multi-Stage Binarized Deep-Model Compression and Performance Recovery

Despite the approaching Internet of Things (IoT) era, most smart devices cannot directly execute complicated deep neural networks (DNNs) due to their limited memory sizes and computing power. Even with model compression, compressing and executing a non-binarized DNN model often needs a codebook for weight calculation, which typically requires acceleration from a Graphics Processing Unit (GPU). Parameter binarization is one commonly used method to reduce computing time and space, but it often seriously compromises the accuracy of the binarized DNN (BNN), which tends to adopt single-stage non-modularized design. To address these issues, we propose a modularized multi-stage BNN model compression with flexible combination of parameter and structure compression to minimize model size while maintaining maximum accuracy possible. The compressed BNNs do not need any codebooks so they better fit into embedded platforms without any GPU acceleration. From our evaluations, the size of a BNN model can be reduced to 1/57 size at the cost of only about a 5% decrease in accuracy, which outperforms the state-of-the-art single-stage compression on BNN models. Furthermore, the proposed method can recover the accuracy of a BNN by 5% even after its size has been further compressed to approximately half its size, demonstrating its potentials in leveraging edge intelligence.

[1] Tarek F. Abdelzaher,et al. DeepIoT: Compressing Deep Neural Network Structures for Sensing Systems with a Compressor-Critic Framework , 2017, SenSys.

[2] J. A. Hartigan,et al. A k-means clustering algorithm , 1979 .

[3] Zhi Zhou,et al. Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing , 2019, IEEE Transactions on Wireless Communications.

[4] Avi Mendelson,et al. Streaming Architecture for Large-Scale Quantized Neural Networks on an FPGA-Based Dataflow Platform , 2017, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[5] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[6] Chao Wang,et al. MALOC: A Fully Pipelined FPGA Accelerator for Convolutional Neural Networks With All Layers Mapped on Chip , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[7] Woei-Jiunn Tsaur,et al. DANS: A Secure and Efficient Driver-Abnormal Notification Scheme With IoT Devices Over IoV , 2019, IEEE Systems Journal.

[8] Long Wang,et al. A Novel Human Activity Recognition Scheme for Smart Health Using Multilayer Extreme Learning Machine , 2019, IEEE Internet of Things Journal.

[9] Jason Cong,et al. Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[10] Bin Liu,et al. Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[12] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[13] Mário P. Véstias,et al. A Survey of Convolutional Neural Networks on Edge with Reconfigurable Computing , 2019, Algorithms.

[14] Soumya Kanti Datta,et al. An edge computing architecture integrating virtual IoT devices , 2017, 2017 IEEE 6th Global Conference on Consumer Electronics (GCCE).

[15] M. W Gardner,et al. Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences , 1998 .

[16] Ming Yang,et al. Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[17] Heng-Yuan Lee,et al. NV-BNN: An Accurate Deep Convolutional Neural Network Based on Binary STT-MRAM for Adaptive AI Edge , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[18] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[19] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[21] Paramvir Bahl,et al. Real-Time Video Analytics: The Killer App for Edge Computing , 2017, Computer.

[22] Gang Tao,et al. Efficient Edge Nodes Reconfiguration and Selection for the Internet of Things , 2019, IEEE Sensors Journal.