论文信息 - RT-mDL: Supporting Real-Time Mixed Deep Learning Tasks on Edge Platforms

RT-mDL: Supporting Real-Time Mixed Deep Learning Tasks on Edge Platforms

Recent years have witnessed an emerging class of real-time applications, e.g., autonomous driving, in which resource-constrained edge platforms need to execute a set of real-time mixed Deep Learning (DL) tasks concurrently. Such an application paradigm poses major challenges due to the huge compute workload of deep neural network models, diverse performance requirements of different tasks, and the lack of real-time support from existing DL frameworks. In this paper, we present RT-mDL, a novel framework to support mixed real-time DL tasks on edge platform with heterogeneous CPU and GPU resource. RT-mDL aims to optimize the mixed DL task execution to meet their diverse real-time/accuracy requirements by exploiting unique compute characteristics of DL tasks. RT-mDL employs a novel storage-bounded model scaling method to generate a series of model variants, and systematically optimizes the DL task execution by joint model variants selection and task priority assignment. To improve the CPU/GPU utilization of mixed DL tasks, RT-mDL also includes a new priority-based scheduler which employs a GPU packing mechanism and executes the CPU/GPU tasks independently. Our implementation on an F1/10 autonomous driving testbed shows that, RT-mDL can enable multiple concurrent DL tasks to achieve satisfactory real-time performance in traffic light detection and sign recognition. Moreover, compared to state-of-the-art baselines, RT-mDL can reduce deadline missing rate by 40.12% while only sacrificing 1.7% model accuracy.

Guoliang Xing | Yuze He | Neiwen Ling | Kai Wang | Daqi Xie

[1] Jan-Michael Frahm,et al. Re-Thinking CNN Frameworks for Time-Sensitive Autonomous-Driving Applications: Addressing an Industrial Challenge , 2019, 2019 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[2] Yifei Zhang,et al. Spatio-temporal Consistency and Hierarchical Matching for Multi-Target Multi-Camera Vehicle Tracking , 2019, CVPR Workshops.

[3] Weijing Shi,et al. Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Jürgen Schmidhuber,et al. LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[5] Agilandeeswari Loganathan,et al. Intelligent animal detection system using sparse multi discriminative-neural network (SMD-NN) to mitigate animal-vehicle collision , 2020, Environmental Science and Pollution Research.

[6] Shaojie Shen,et al. Stereo R-CNN Based 3D Object Detection for Autonomous Driving , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7] S. R. Livingstone,et al. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English , 2018, PloS one.

[8] Hyoseung Kim,et al. Pipelined Data-Parallel CPU/GPU Scheduling for Multi-DNN Real-Time Inference , 2019, 2019 IEEE Real-Time Systems Symposium (RTSS).

[9] Nadra Guizani,et al. Autonomous Driving Cars in Smart Cities: Recent Advances, Requirements, and Challenges , 2020, IEEE Network.

[10] Rami G. Melhem,et al. Dynamic and aggressive scheduling techniques for power-aware real-time systems , 2001, Proceedings 22nd IEEE Real-Time Systems Symposium (RTSS 2001) (Cat. No.01PR1420).

[11] AsyMo , 2021, Proceedings of the 27th Annual International Conference on Mobile Computing and Networking.

[12] Maciej Urbanski,et al. Intel Nervana Neural Network Processor-T (NNP-T) Fused Floating Point Many-Term Dot Product , 2020, 2020 IEEE 27th Symposium on Computer Arithmetic (ARITH).

[13] Justin Salamon,et al. A Dataset and Taxonomy for Urban Sound Research , 2014, ACM Multimedia.

[14] Cong Liu,et al. S^3DNN: Supervised Streaming and Scheduling for GPU-Accelerated Real-Time DNN Workloads , 2018, 2018 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[15] Marco Gruteser,et al. Edge Assisted Real-time Object Detection for Mobile Augmented Reality , 2019, MobiCom.

[16] Giorgio C. Buttazzo,et al. Partitioned Fixed-Priority Scheduling of Parallel Tasks Without Preemptions , 2018, 2018 IEEE Real-Time Systems Symposium (RTSS).

[17] Wei Xu,et al. Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[18] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19] Niraj K. Jha,et al. ChamNet: Towards Efficient Network Design Through Platform-Aware Model Adaptation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Dewant Katare,et al. Embedded System Enabled Vehicle Collision Detection: An ANN Classifier , 2019, 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC).

[21] Taimoor Akhtar,et al. Multi objective optimization of computationally expensive multi-modal functions with RBF surrogates and multi-rule selection , 2016, J. Glob. Optim..

[22] NestDNN , 2018, Proceedings of the 24th Annual International Conference on Mobile Computing and Networking.

[23] Song Han,et al. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[24] Marco Spuri,et al. Efficient aperiodic service under earliest deadline scheduling , 1994, 1994 Proceedings Real-Time Systems Symposium.

[25] Challenge , 2020, Proceedings of the 26th Annual International Conference on Mobile Computing and Networking.

[26] Wang Yi,et al. A Capacity Augmentation Bound for Real-Time Constrained-Deadline Parallel Tasks Under GEDF , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[27] Jieping Ye,et al. AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates , 2020, AAAI.

[28] Cong Liu,et al. ApNet: Approximation-Aware Real-Time Neural Network , 2018, 2018 IEEE Real-Time Systems Symposium (RTSS).

[29] FastDeepIoT , 2018, Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems.

[30] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Tarek F. Abdelzaher,et al. FastDeepIoT: Towards Understanding and Optimizing Neural Network Execution Time on Mobile and Embedded Devices , 2018, SenSys.

[32] Ross B. Girshick,et al. Fast and Accurate Model Scaling , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Xiao Zeng,et al. NestDNN: Resource-Aware Multi-Tenant On-Device Deep Learning for Continuous Mobile Vision , 2018, MobiCom.

[34] Lars Kotthoff,et al. FlexiBO: Cost-Aware Multi-Objective Optimization of Deep Neural Networks , 2020, ArXiv.

[35] Francisco J. Cazorla,et al. Generating and Exploiting Deep Learning Variants to Increase Heterogeneous Resource Utilization in the NVIDIA Xavier , 2019, ECRTS.

[36] Justin Salamon,et al. Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification , 2016, IEEE Signal Processing Letters.

[37] ZhangSheng,et al. A New CNN-Based Method for Multi-Directional Car License Plate Detection , 2018 .

[38] Bo Chen,et al. MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Xiufeng Xie,et al. Source Compression with Bounded DNN Perception Loss for IoT Edge Computer Vision , 2019, MobiCom.

[40] Song Han,et al. AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[41] Nicola Capodieci,et al. Deadline-Based Scheduling for GPU with Preemption Support , 2018, 2018 IEEE Real-Time Systems Symposium (RTSS).

[42] Ragunathan Rajkumar,et al. CycleTandem: Energy-Saving Scheduling for Real-Time Systems with Hardware Accelerators , 2018, 2018 IEEE Real-Time Systems Symposium (RTSS).

[43] Sotiris Karabetsos,et al. A Review of Machine Learning and IoT in Smart Transportation , 2019, Future Internet.

[44] Haoshan Shi,et al. Dynamic Frame-Skipping Scheme for Live Video Encoders , 2010, 2010 International Conference on Multimedia Technology.

[45] Alec Wolman,et al. MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints , 2016, MobiSys.

[46] Yue Gao,et al. Video Shot Boundary Detection Using Frame-Skipping Technique , 2006 .

[47] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[48] Tarek F. Abdelzaher,et al. DeepIoT: Compressing Deep Neural Network Structures for Sensing Systems with a Compressor-Critic Framework , 2017, SenSys.

[49] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[50] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[51] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.

[52] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[53] Frédéric Fauberteau,et al. An hypervisor approach for mixed critical real-time UAV applications , 2019, 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops).

[54] Jungwon Lee,et al. Fused DNN: A Deep Neural Network Fusion Approach to Fast and Robust Pedestrian Detection , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[55] Paul Pop,et al. Design Optimization of Mixed-Criticality Real-Time Applications on Cost-Constrained Partitioned Architectures , 2011, 2011 IEEE 32nd Real-Time Systems Symposium.

[56] Ming Yang,et al. Making OpenVX Really "Real Time" , 2018, 2018 IEEE Real-Time Systems Symposium (RTSS).

[57] Fengyuan Xu,et al. AsyMo: scalable and efficient deep-learning inference on asymmetric mobile CPUs , 2021, MobiCom.

[58] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[59] Johannes Stallkamp,et al. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition , 2012, Neural Networks.

[60] Dipankar Raychaudhuri,et al. Challenge: COSMOS: A city-scale programmable testbed for experimentation with advanced wireless , 2020, MobiCom.

[61] Youngki Lee,et al. Heimdall: mobile GPU coordination platform for augmented reality applications , 2020, MobiCom.

[62] Wenyao Xu,et al. ADMM-based Weight Pruning for Real-Time Deep Learning Acceleration on Mobile Devices , 2019, ACM Great Lakes Symposium on VLSI.