Dynamic DNN Decomposition for Lossless Synergistic Inference

Deep neural networks (DNNs) sustain high performance in today’s data processing applications. DNN inference is resource-intensive thus is difficult to fit into a mobile device. To solve this problem, recent advances unleash DNN services using the edge computing paradigm. The existing approaches split a DNN into two parts and deploy the two partitions to computation nodes at two edge computing tiers. Nonetheless, these methods overlook collaborative device-edge-cloud computation resources. Besides, previous algorithms demand the whole DNN re-partitioning to adapt to computation resource changes and network dynamics. Moreover, for resource-demanding convolutional layers, prior works do not give a parallel processing strategy without loss of accuracy at the edge side. To tackle these issues, we propose D3, a dynamic DNN decomposition system for synergistic inference without precision loss. The proposed system introduces a heuristic algorithm named horizontal partition algorithm to split a DNN into three parts. The algorithm partially adjust the partitions at run time according to processing time and network conditions. At the edge side, a vertical separation module separates feature maps into tiles that can be independently run on different edge nodes in parallel. Extensive quantitative evaluation of five popular DNNs illustrates that D3 outperforms the state-of-the-art counterparts up to 3.4× in end-to-end DNN inference time and reduces backbone network communication overhead up to 3.68×.

[1]  J. Hartmanis Computers and Intractability: A Guide to the Theory of NP-Completeness (Michael R. Garey and David S. Johnson) , 1982 .

[2]  P. Gács,et al.  Algorithms , 1992 .

[3]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Erwin Pesch,et al.  A branch-and-bound algorithm for the acyclic partitioning problem , 2014, Comput. Oper. Res..

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[8]  Tie-Yan Liu,et al.  Dual Learning for Machine Translation , 2016, NIPS.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Trevor N. Mudge,et al.  Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[11]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[12]  Yiran Chen,et al.  MoDNN: Local distributed mobile computing system for Deep Neural Network , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[13]  H. T. Kung,et al.  Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[14]  Andreas Gerstlauer,et al.  DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[15]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[16]  Yan Zhang,et al.  Mobile Edge Computing: A Survey , 2018, IEEE Internet of Things Journal.

[17]  Soo-Mook Moon,et al.  IONN: Incremental Offloading of Neural Network Computations from Mobile Devices to Edge Servers , 2018, SoCC.

[18]  Li Zhou,et al.  Adaptive parallel execution of deep neural networks on heterogeneous edge devices , 2019, SEC.

[19]  Carole-Jean Wu,et al.  Machine Learning at Facebook: Understanding Inference at the Edge , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[20]  Xu Chen,et al.  Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing , 2019, Proceedings of the IEEE.

[21]  Dan Wang,et al.  Dynamic Adaptive DNN Surgery for Inference Acceleration on the Edge , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[22]  Wei Cao,et al.  Intelligent Offloading in Multi-Access Edge Computing: A State-of-the-Art Review and Framework , 2019, IEEE Communications Magazine.

[23]  David Barber,et al.  Tracking by Animation: Unsupervised Learning of Multi-Object Attentive Trackers , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  T. Mitra,et al.  Neural Network Inference on Mobile SoCs , 2019, IEEE Design & Test.

[25]  Joel Emer,et al.  Efficient Processing of Deep Neural Networks , 2020, Synthesis Lectures on Computer Architecture.

[26]  James McCauley,et al.  Making edge-computing resilient , 2020, SoCC.

[27]  Zhi Zhou,et al.  Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing , 2019, IEEE Transactions on Wireless Communications.

[28]  Xiaofei Wang,et al.  Convergence of Edge Computing and Deep Learning: A Comprehensive Survey , 2019, IEEE Communications Surveys & Tutorials.

[29]  Mario Di Francesco,et al.  Distributed Inference Acceleration with Adaptive DNN Partitioning and Offloading , 2020, IEEE INFOCOM 2020 - IEEE Conference on Computer Communications.

[30]  Ilias Leontiadis,et al.  SPINN: synergistic progressive inference of neural networks over device and cloud , 2020, MobiCom.