论文信息 - Multi-Task Zipping via Layer-wise Neuron Sharing

Multi-Task Zipping via Layer-wise Neuron Sharing

Future mobile devices are anticipated to perceive, understand and react to the world on their own by running multiple correlated deep neural networks on-device. Yet the complexity of these neural networks needs to be trimmed down both within-model and cross-model to fit in mobile storage and memory. Previous studies focus on squeezing the redundancy within a single neural network. In this work, we aim to reduce the redundancy across multiple models. We propose Multi-Task Zipping (MTZ), a framework to automatically merge correlated, pre-trained deep neural networks for cross-model compression. Central in MTZ is a layer-wise neuron sharing and incoming weight updating scheme that induces a minimal change in the error function. MTZ inherits information from each model and demands light retraining to re-boost the accuracy of individual tasks. Evaluations show that MTZ is able to fully merge the hidden layers of two VGG-16 networks with a 3.18% increase in the test error averaged on ImageNet and CelebA, or share 39.61% parameters between the two networks with <0.5% increase in the test errors for both tasks. The number of iterations to retrain the combined network is at least 17.8 times lower than that of training a single VGG-16 network. Moreover, experiments show that MTZ is also able to effectively merge multiple residual networks.

[1] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[2] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.

[3] Misha Denil,et al. Predicting Parameters in Deep Learning , 2014 .

[4] Nicholas D. Lane,et al. DeepEye: Resource Efficient Local Execution of Multiple Deep Vision Models using Wearable Commodity Hardware , 2017, MobiSys.

[5] Babak Hassibi,et al. Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[6] Yu Cheng,et al. Fully-Adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Luc Van Gool,et al. Deep Expectation of Real and Apparent Age from a Single Image Without Facial Landmarks , 2016, International Journal of Computer Vision.

[8] Lin Zhong,et al. Starfish: Efficient Concurrency Support for Computer Vision Applications , 2015, MobiSys.

[9] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[10] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[11] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[12] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.

[13] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[14] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[15] Cecilia Mascolo,et al. Low-resource Multi-task Audio Sensing for Mobile and Embedded Devices via Shared Deep Neural Network Representations , 2017, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[16] Philip S. Yu,et al. Learning Multiple Tasks with Multilinear Relationship Networks , 2015, NIPS.

[17] Yixin Chen,et al. Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[18] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[19] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.

[20] Yu Zhang,et al. A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[21] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[22] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[23] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[24] Yurong Chen,et al. Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[25] Xiaogang Wang,et al. Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[26] Andrea Vedaldi,et al. Learning multiple visual domains with residual adapters , 2017, NIPS.

[27] Rui Peng,et al. Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures , 2016, ArXiv.

[28] Johannes Stallkamp,et al. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition , 2012, Neural Networks.

[29] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[30] Xin Dong,et al. Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon , 2017, NIPS.

[31] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[32] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[33] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[34] Yongxin Yang,et al. Deep Multi-task Representation Learning: A Tensor Factorisation Approach , 2016, ICLR.

[35] Alec Wolman,et al. MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints , 2016, MobiSys.

[36] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[38] Martial Hebert,et al. Cross-Stitch Networks for Multi-task Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).