论文信息 - Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT

Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT

Model compression has emerged as an important area of research for deploying deep learning models on Internet-of-Things (IoT). However, for extremely memory-constrained scenarios, even the compressed models cannot fit within the memory of a single device and, as a result, must be distributed across multiple devices. This leads to a distributed inference paradigm in which memory and communication costs represent a major bottleneck. Yet, existing model compression techniques are not communication-aware. Therefore, we propose Network of Neural Networks (NoNN), a new distributed IoT learning paradigm that compresses a large pretrained ‘teacher’ deep network into several disjoint and highly-compressed ‘student’ modules, without loss of accuracy. Moreover, we propose a network science-based knowledge partitioning algorithm for the teacher model, and then train individual students on the resulting disjoint partitions. Extensive experimentation on five image classification datasets, for user-defined memory/performance budgets, show that NoNN achieves higher accuracy than several baselines and similar accuracy as the teacher model, while using minimal communication among students. Finally, as a case study, we deploy the proposed model for CIFAR-10 dataset on edge devices and demonstrate significant improvements in memory footprint (up to 24×), performance (up to 12×), and energy per node (up to 14×) compared to the large teacher model. We further show that for distributed inference on multiple edge devices, our proposed NoNN model results in up to 33× reduction in total latency w.r.t. a state-of-the-art model compression baseline.

[1] Gunhee Kim,et al. SplitNet: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization , 2017, ICML.

[2] Vikas Chandra,et al. Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations , 2017, ArXiv.

[3] Andreas Gerstlauer,et al. DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[4] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[5] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[6] Hanan Samet,et al. Pruning Filters for Efficient ConvNets , 2016, ICLR.

[7] Pietro Perona,et al. Caltech-UCSD Birds 200 , 2010 .

[8] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[9] M E J Newman,et al. Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[10] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[11] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[14] Ran El-Yaniv,et al. Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[15] Antonio Torralba,et al. Recognizing indoor scenes , 2009, CVPR.

[16] Zhengyang Wang,et al. ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Alexander M. Rush,et al. Sequence-Level Knowledge Distillation , 2016, EMNLP.

[18] Xiangyu Zhang,et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19] Yundong Zhang,et al. Hello Edge: Keyword Spotting on Microcontrollers , 2017, ArXiv.

[20] Yiran Chen,et al. MeDNN: A distributed mobile system with enhanced partition and deployment for large-scale DNNs , 2017, 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[21] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[22] Nikos Komodakis,et al. Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[23] Yiran Chen,et al. MoDNN: Local distributed mobile computing system for Deep Neural Network , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[24] Vivienne Sze,et al. Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Zhiyuan Tang,et al. Recurrent neural network training with dark knowledge transfer , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26] Duncan J. Watts,et al. The Structure and Dynamics of Networks: (Princeton Studies in Complexity) , 2006 .