On-Device Deep Multi-Task Inference via Multi-Task Zipping

Future mobile devices are anticipated to perceive, understand and react to the world on their own by running multiple correlated deep neural networks locally on-device. Yet the complexity of these deep models needs to be trimmed down both within-model and cross-model to fit in mobile storage and memory. Previous studies squeeze the redundancy within a single model. In this work, we aim to reduce the redundancy across multiple models. We propose Multi-Task Zipping (MTZ), a framework to automatically merge correlated, pre-trained deep neural networks for cross-model compression. Central in MTZ is a layer-wise neuron sharing and incoming weight updating scheme that induces a minimal change in the error function. MTZ inherits information from each model and demands light retraining to re-boost the accuracy of individual tasks. MTZ supports typical network layers (fully-connected, convolutional and residual) and applies to inference tasks with different input domains. Evaluations show that MTZ can fully merge the hidden layers of two VGG-16 networks with a 3.18% increase in the test error averaged on ImageNet for object classification and CelebA for facial attribute classification, or share <inline-formula><tex-math notation="LaTeX">$39.61\%$</tex-math><alternatives><mml:math><mml:mrow><mml:mn>39</mml:mn><mml:mo>.</mml:mo><mml:mn>61</mml:mn><mml:mo>%</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq1-3124306.gif"/></alternatives></inline-formula> parameters between the two networks with <inline-formula><tex-math notation="LaTeX">$<0.5\%$</tex-math><alternatives><mml:math><mml:mrow><mml:mo><</mml:mo><mml:mn>0</mml:mn><mml:mo>.</mml:mo><mml:mn>5</mml:mn><mml:mo>%</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq2-3124306.gif"/></alternatives></inline-formula> increase in the test errors. The number of iterations to retrain the combined network is at least <inline-formula><tex-math notation="LaTeX">$17.8\times$</tex-math><alternatives><mml:math><mml:mrow><mml:mn>17</mml:mn><mml:mo>.</mml:mo><mml:mn>8</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq3-3124306.gif"/></alternatives></inline-formula> lower than that of training a single VGG-16 network. Moreover, MTZ can effectively merge nine residual networks for diverse inference tasks and models for different input domains. And with the model merged by MTZ, the latency to switch between these tasks on memory-constrained devices is reduced by <inline-formula><tex-math notation="LaTeX">$8.71{\times}$</tex-math><alternatives><mml:math><mml:mrow><mml:mn>8</mml:mn><mml:mo>.</mml:mo><mml:mn>71</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq4-3124306.gif"/></alternatives></inline-formula>.

[1]  Jun Wang,et al.  A Framework for Behavioral Biometric Authentication Using Deep Metric Learning on Mobile Devices , 2020, IEEE Transactions on Mobile Computing.

[2]  Xi Zhang,et al.  MDLdroidLite: A Release-and-Inhibit Control Approach to Resource-Efficient Deep Neural Networks on Mobile Devices , 2020, IEEE Transactions on Mobile Computing.

[3]  Zimu Zhou,et al.  EdgeDuet: Tiling Small Object Detection for Edge Assisted Autonomous Mobile Vision , 2021, IEEE INFOCOM 2021 - IEEE Conference on Computer Communications.

[4]  Anandarup Mukherjee,et al.  Magnum: A Distributed Framework for Enabling Transfer Learning in B5G-Enabled Industrial IoT , 2020, IEEE Transactions on Industrial Informatics.

[5]  Yingyan Lin,et al.  AdaDeep: A Usage-Driven, Automated Deep Model Compression Framework for Enabling Ubiquitous Intelligent Mobiles , 2020, IEEE Transactions on Mobile Computing.

[6]  Timmy S. T. Wan,et al.  Merging Well-Trained Deep CNN Models for Efficient Inference , 2020, 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[7]  Shuochao Yao,et al.  AMVP: Adaptive CNN-based Multitask Video Processing on Mobile Stream Processing Platforms , 2020, 2020 IEEE/ACM Symposium on Edge Computing (SEC).

[8]  Shahriar Nirjon,et al.  Fast and scalable in-memory deep multitask learning via neural weight virtualization , 2020, MobiSys.

[9]  Juheon Yi,et al.  EagleEye: wearable camera-based person identification in crowded urban spaces , 2020, MobiCom.

[10]  Yuan Xie,et al.  Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey , 2020, Proceedings of the IEEE.

[11]  Mohsen Guizani,et al.  Deep CNN-Based Real-Time Traffic Light Detector for Self-Driving Vehicles , 2020, IEEE Transactions on Mobile Computing.

[12]  Lothar Thiele,et al.  Adaptive Loss-Aware Quantization for Multi-Bit Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  R. Feris,et al.  AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning , 2019, NeurIPS.

[14]  Jitendra Malik,et al.  Which Tasks Should Be Learned Together in Multi-task Learning? , 2019, ICML.

[15]  Pavlo Molchanov,et al.  Importance Estimation for Neural Network Pruning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Xu Chen,et al.  Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing , 2019, Proceedings of the IEEE.

[17]  Xiao Zeng,et al.  NestDNN: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision , 2018, MobiCom.

[18]  Lothar Thiele,et al.  Multi-Task Zipping via Layer-wise Neuron Sharing , 2018, NeurIPS.

[19]  Yi-Ming Chan,et al.  Unifying and Merging Well-trained Deep Neural Networks for Inference Stage , 2018, IJCAI.

[20]  Zhenming Liu,et al.  DeepDecision: A Mobile Deep Learning Framework for Edge Video Analytics , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[21]  David P. Wipf,et al.  Compressing Neural Networks using the Variational Information Bottleneck , 2018, ICML.

[22]  Qiang Yang,et al.  An Overview of Multi-task Learning , 2018 .

[23]  Cecilia Mascolo,et al.  Low-resource Multi-task Audio Sensing for Mobile and Embedded Devices via Shared Deep Neural Network Representations , 2017, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[24]  Nicholas D. Lane,et al.  DeepEye: Resource Efficient Local Execution of Multiple Deep Vision Models using Wearable Commodity Hardware , 2017, MobiSys.

[25]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[26]  Andrea Vedaldi,et al.  Learning multiple visual domains with residual adapters , 2017, NIPS.

[27]  Xin Dong,et al.  Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon , 2017, NIPS.

[28]  Trevor N. Mudge,et al.  Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[29]  Yu Cheng,et al.  Fully-Adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Yongxin Yang,et al.  Deep Multi-task Representation Learning: A Tensor Factorisation Approach , 2016, ICLR.

[31]  Luc Van Gool,et al.  Deep Expectation of Real and Apparent Age from a Single Image Without Facial Landmarks , 2016, International Journal of Computer Vision.

[32]  Shiliang Zhang,et al.  Multimodal Deep Convolutional Neural Network for Audio-Visual Emotion Recognition , 2016, ICMR.

[33]  Andrea Vedaldi,et al.  Integrated perception with recurrent multi-task neural networks , 2016, NIPS.

[34]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[35]  Martial Hebert,et al.  Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Nicholas D. Lane,et al.  DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices , 2016, 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[37]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[38]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[39]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[40]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[41]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[42]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[43]  Iasonas Kokkinos,et al.  Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[45]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[46]  Johannes Stallkamp,et al.  Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition , 2012, Neural Networks.

[47]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[48]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[49]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[50]  Ling Guan,et al.  Recognizing Human Emotional State From Audiovisual Signals , 2008, IEEE Transactions on Multimedia.

[51]  Dariu Gavrila,et al.  An Experimental Study on Pedestrian Classification , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[53]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.