Intermediate Deep Feature Compression: the Next Battlefield of Intelligent Sensing

The recent advances of hardware technology have made the intelligent analysis equipped at the front-end with deep learning more prevailing and practical. To better enable the intelligent sensing at the front-end, instead of compressing and transmitting visual signals or the ultimately utilized top-layer deep learning features, we propose to compactly represent and convey the intermediate-layer deep learning features of high generalization capability, to facilitate the collaborating approach between front and cloud ends. This strategy enables a good balance among the computational load, transmission load and the generalization ability for cloud servers when deploying the deep neural networks for large scale cloud based visual analysis. Moreover, the presented strategy also makes the standardization of deep feature coding more feasible and promising, as a series of tasks can simultaneously benefit from the transmitted intermediate layers. We also present the results for evaluation of lossless deep feature compression with four benchmark data compression methods, which provides meaningful investigations and baselines for future research and standardization activities.

[1]  Jitendra Malik,et al.  Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Jie Lin,et al.  A practical guide to CNNs and Fisher Vectors for image instance retrieval , 2015, Signal Process..

[3]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Wen Gao,et al.  HNIP: Compact Deep Invariant Representations for Video Matching, Localization, and Retrieval , 2017, IEEE Transactions on Multimedia.

[6]  Hanjiang Lai,et al.  Simultaneous feature learning and hash coding with deep neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Gregory J. Wolff,et al.  Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.

[8]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[9]  Luis Ceze,et al.  Neural Acceleration for General-Purpose Approximate Programs , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[10]  Igor Carron,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .

[11]  Victor S. Lempitsky,et al.  Neural Codes for Image Retrieval , 2014, ECCV.

[12]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[13]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[14]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[15]  Xiaogang Wang,et al.  Visual Tracking with Fully Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Lin Wu,et al.  Effective Multi-Query Expansions: Collaborative Deep Networks for Robust Landmark Retrieval , 2017, IEEE Transactions on Image Processing.

[17]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[18]  Albert Gordo,et al.  Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.

[19]  David Gregg,et al.  The Movidius Myriad Architecture's Potential for Scientific Computing , 2015, IEEE Micro.

[20]  Xiaogang Wang,et al.  Joint Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Xiaogang Wang,et al.  DeepID3: Face Recognition with Very Deep Neural Networks , 2015, ArXiv.

[22]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[23]  Hanjiang Lai,et al.  Supervised Hashing for Image Retrieval via Image Representation Learning , 2014, AAAI.

[24]  Tiejun Huang,et al.  Deep Relative Distance Learning: Tell the Difference between Similar Vehicles , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[26]  Jen-Hao Hsiao,et al.  Deep learning of binary hash codes for fast image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[27]  Jiasen Lu,et al.  Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.

[28]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[29]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..

[30]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[31]  Terry A. Welch,et al.  A Technique for High-Performance Data Compression , 1984, Computer.

[32]  Paul Rad,et al.  A Next-Generation Secure Cloud-Based Deep Learning License Plate Recognition for Smart Cities , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).

[33]  Bin Liu,et al.  Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[35]  Peter Deutsch,et al.  DEFLATE Compressed Data Format Specification version 1.3 , 1996, RFC.

[36]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Wen Gao,et al.  AI-Oriented Large-Scale Video Management for Smart City: Technologies, Standards, and Beyond , 2017, IEEE MultiMedia.

[38]  Xiaogang Wang,et al.  Learning Deep Feature Representations with Domain Guided Dropout for Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[40]  Gary J. Sullivan,et al.  Overview of the High Efficiency Video Coding (HEVC) Standard , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[41]  Yu Wang,et al.  A Survey of FPGA-Based Neural Network Accelerator , 2017, 1712.08934.

[42]  Trevor Darrell,et al.  Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.

[43]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[46]  Ivan V. Bajic,et al.  Near-Lossless Deep Feature Compression for Collaborative Intelligence , 2018, 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP).

[47]  Xiaogang Wang,et al.  Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[48]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[49]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[51]  Ivan V. Bajic,et al.  Deep Feature Compression for Collaborative Object Detection , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[52]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[53]  Tieniu Tan,et al.  Deep semantic ranking based hashing for multi-label image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[55]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[56]  Gang Wang,et al.  Stack-Captioning: Coarse-to-Fine Learning for Image Captioning , 2017, AAAI.

[57]  Ling-Yu Duan,et al.  Compact Descriptors for Video Analysis: The Emerging MPEG Standard , 2017, IEEE MultiMedia.