ParaDiS: Parallelly Distributable Slimmable Neural Networks

When several limited power devices are available, one of the most efficient ways to make profit of these resources, while reducing the processing latency and communication load, is to run in parallel several neural sub-networks and to fuse the result at the end of processing. However, such a combination of sub-networks must be trained specifically for each particular configuration of devices (characterized by number of devices and their capacities) which may vary over different model deployments and even within the same deployment. In this work we introduce parallelly distributable slimmable (ParaDiS) neural networks that are splittable in parallel among various device configurations without retraining. While inspired by slimmable networks allowing instant adaptation to resources on just one device, ParaDiS networks consist of several multi-device distributable configurations or switches that strongly share the parameters between them. We evaluate ParaDiS framework on MobileNet v1 and ResNet-50 architectures on ImageNet classification task and WDSR architecture for image super-resolution task. We show that ParaDiS switches achieve similar or better accuracy than the individual models, i.e., distributed models of the same structure trained individually. Moreover, we show that, as compared to universally slimmable networks that are not distributable, the accuracy of distributable ParaDiS switches either does not drop at all or drops by a maximum of 1 % only in the worst cases. Finally, once distributed over several devices, ParaDiS outperforms greatly slimmable models.

[1]  Kris M. Kitani,et al.  On the Surprising Efficiency of Committee-based Models , 2021 .

[2]  Kilian Q. Weinberger,et al.  Multi-Scale Dense Networks for Resource Efficient Image Classification , 2017, ICLR.

[3]  Kartikeya Bhardwaj,et al.  Memory- and Communication-Aware Model Compression for Distributed Deep Learning Inference on IoT , 2019, ACM Trans. Embed. Comput. Syst..

[4]  Michael S. Ryoo,et al.  Collaborative Execution of Deep Neural Networks on Internet of Things Devices , 2019, ArXiv.

[5]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[6]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[7]  Trevor N. Mudge,et al.  Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[8]  Marco Levorato,et al.  Split Computing and Early Exiting for Deep Learning Applications: Survey and Research Challenges , 2021, ArXiv.

[9]  H. T. Kung,et al.  Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[10]  Alexey Tumanov,et al.  CompOFA: Compound Once-For-All Networks for Faster Multi-Platform Deployment , 2021, ArXiv.

[11]  Jianbin Tang,et al.  Ensemble Knowledge Distillation for Learning Improved and Efficient Networks , 2020, ECAI.

[12]  Quoc V. Le,et al.  Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Tim Salimans,et al.  Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[15]  Alexey Ozerov,et al.  Inplace knowledge distillation with teacher assistant for improved training of flexible deep neural networks , 2021, 2021 29th European Signal Processing Conference (EUSIPCO).

[16]  Yiannis Demiris,et al.  Context-Aware Deep Feature Compression for High-Speed Visual Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Zhihui Li,et al.  Dynamic Slimmable Network , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Thomas S. Huang,et al.  Universally Slimmable Networks and Improved Training Techniques , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Quoc V. Le,et al.  BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models , 2020, ECCV.

[20]  Distilled Hierarchical Neural Ensembles with Adaptive Inference Cost , 2020, ArXiv.

[21]  Jiahui Yu,et al.  Wide Activation for Efficient Image and Video Super-Resolution , 2019, BMVC.

[22]  Tao Zhang,et al.  Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges , 2018, IEEE Signal Processing Magazine.

[23]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[24]  Ning Xu,et al.  Slimmable Neural Networks , 2018, ICLR.

[25]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[26]  Luc Van Gool,et al.  NTIRE 2017 Challenge on Single Image Super-Resolution: Methods and Results , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[27]  Chuang Gan,et al.  Once for All: Train One Network and Specialize it for Efficient Deployment , 2019, ICLR.

[28]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).