Intermediate deep-feature compression for multitasking

Collaborative intelligence is a new strategy to deploy deep neural network model for AI-based mobile devices, which runs a part of model on the mobile to extract features, the rest part in the cloud. In such case, feature data but not the raw image needs to be transmitted to cloud, and the features uploaded to cloud need have generalization capability to complete multitask. To this end, we design an encoder-decoder network to get intermediate deep features of image, and propose a method to make the features complete different tasks. Finally, we use a lossy compression method for intermediate deep features to improve transmission efficiency. Experimental results show that the features extracted by our network can complete input reconstruction and object detection simultaneously. Besides, with the deep-feature compression method proposed in our work, the quality of reconstructed image is good in visual and index of quantitative assessment, and object detection also has a good result in accuracy.

[1]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Wen Gao,et al.  Scalable Facial Image Compression with Deep Feature Reconstruction , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[3]  Ivan V. Bajic,et al.  Multi-Task Learning with Compressible Features for Collaborative Intelligence , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[4]  Zhou Wang,et al.  Video quality assessment using structural distortion measurement , 2002, Proceedings. International Conference on Image Processing.

[5]  Ling-Yu Duan,et al.  Intermediate Deep Feature Compression: the Next Battlefield of Intelligent Sensing , 2018, ArXiv.

[6]  Trevor N. Mudge,et al.  Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[7]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[8]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[12]  Ivan V. Bajic,et al.  Near-Lossless Deep Feature Compression for Collaborative Intelligence , 2018, 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP).

[13]  Thomas Brox,et al.  Inverting Visual Representations with Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Ivan V. Bajic,et al.  Deep Feature Compression for Collaborative Object Detection , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[16]  Xiaogang Wang,et al.  Visual Tracking with Fully Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Yang Li,et al.  Joint Rate-Distortion Optimization for Simultaneous Texture and Deep Feature Compression of Facial Images , 2018, 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM).

[18]  Wen Gao,et al.  AI-Oriented Large-Scale Video Management for Smart City: Technologies, Standards, and Beyond , 2017, IEEE MultiMedia.