Towards Collaborative Intelligence Friendly Architectures for Deep Learning

Modern mobile devices are equipped with highperformance hardware resources such as graphics processing units (GPUs), making the end-side intelligent services more feasible. Even recently, specialized silicons as neural engines are being used for mobile devices. However, most mobile devices are still not capable of performing real-time inference using very deep models. Computations associated with deep models for today's intelligent applications are typically performed solely on the cloud. This cloud-only approach requires significant amounts of raw data to be uploaded to the cloud over the mobile wireless network and imposes considerable computational and communication load on the cloud server. Recent studies have shown that the latency and energy consumption of deep neural networks in mobile applications can be notably reduced by splitting the workload between the mobile device and the cloud. In this approach, referred to as collaborative intelligence, intermediate features computed on the mobile device are offloaded to the cloud instead of the raw input data of the network, reducing the size of the data needed to be sent to the cloud. In this paper, we design a new collaborative intelligence friendly architecture by introducing a unit responsible for reducing the size of the feature data needed to be offloaded to the cloud to a greater extent, where this unit is placed after a selected layer of a deep model. This unit is referred to as the butterfly unit. The butterfly unit consists of the reduction unit and the restoration unit. The outputs of the reduction unit is offloaded to the cloud server on which the computations associated with the restoration unit and the rest of the inference network are performed. Both the reduction and restoration units use a convolutional layer as their main component. The inference outcomes are sent back to the mobile device. The new network architecture, including the introduced butterfly unit after a selected layer of the underlying deep model, is trained end-to-end. Our proposed method, across different wireless networks, achieves on average 53x improvements for end-to-end latency and 68 x improvements for mobile energy consumption compared to the status quo cloud-only approach for ResNet-50, while the accuracy loss is less than 2 %.

[1]  Ivan V. Bajic,et al.  Deep Feature Compression for Collaborative Object Detection , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[2]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[3]  Massoud Pedram,et al.  Energy and Performance Efficient Computation Offloading for Deep Neural Networks in a Mobile Cloud Computing Environment , 2018, ACM Great Lakes Symposium on VLSI.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Massoud Pedram,et al.  JointDNN: An Efficient Training and Inference Engine for Intelligent Mobile Cloud Computing Services , 2018, IEEE Transactions on Mobile Computing.

[8]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Ling-Yu Duan,et al.  Intermediate Deep Feature Compression: the Next Battlefield of Intelligent Sensing , 2018, ArXiv.

[10]  Faisal Nawab,et al.  Collaborative Edge and Cloud Neural Networks for Real-Time Video Processing , 2018, Proc. VLDB Endow..

[11]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[12]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[13]  Feng Qian,et al.  A close examination of performance and power characteristics of 4G LTE networks , 2012, MobiSys '12.

[14]  Ivan V. Bajic,et al.  Near-Lossless Deep Feature Compression for Collaborative Intelligence , 2018, 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP).

[15]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Trevor N. Mudge,et al.  Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[18]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.