Characterizing the Deployment of Deep Neural Networks on Commercial Edge Devices

The great success of deep neural networks (DNNs) has significantly assisted humans in numerous applications such as computer vision. DNNs are widely used in today's applications and systems. However, in-the-edge inference of DNNs is still a severe challenge mainly because of the contradiction between the inherent intensive resource requirements of DNNs and the tight resource availability of edge devices. Nevertheless, in-the-edge inferencing preserves privacy in several user-centric domains and applies in several scenarios with limited Internet connectivity (e.g., drones, robots, autonomous vehicles). That is why several companies have released specialized edge devices for accelerating the execution performance of DNNs in the edge. Although preliminary studies have characterized such edge devices separately, a unified comparison with the same set of assumptions has not been fully performed. In this paper, we endeavor to address this knowledge gap by characterizing several commercial edge devices on popular frameworks using well-known convolution neural networks (CNNs), a type of DNN. We analyze the impact of frameworks, their software stack, and their implemented optimizations on the final performance. Moreover, we measure energy consumption and temperature behavior of these edge devices.

[1]  Sudhakar Yalamanchili,et al.  LODESTAR: Creating Locally-Dense CNNs for Efficient Inference on Systolic Arrays* , 2019, 2019 56th ACM/IEEE Design Automation Conference (DAC).

[2]  Францкевич Кирилл Эдуардович,et al.  ИССЛЕДОВАНИЕ КЛАСТЕРНОЙ СИСТЕМЫ НА ОСНОВЕ ОДНОПЛАТНЫХ КОМПЬЮТЕРОВ RASPBERRY PI 3B , 2019 .

[3]  Massimo Banzi,et al.  Make: Getting Started with Arduino: The Open Source Electronics Prototyping Platform , 2014 .

[4]  Tianshi Chen,et al.  ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[5]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[6]  Erdogan Dogdu,et al.  Context-Aware Computing, Learning, and Big Data in Internet of Things: A Survey , 2018, IEEE Internet of Things Journal.

[7]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[8]  Sachin S. Talathi,et al.  Fixed Point Quantization of Deep Convolutional Networks , 2015, ICML.

[9]  Yu Wang,et al.  Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[10]  Yifan Wang,et al.  pCAMP: Performance Comparison of Machine Learning Packages on the Edges , 2019, HotEdge.

[11]  Michael S. Ryoo,et al.  Collaborative Execution of Deep Neural Networks on Internet of Things Devices , 2019, ArXiv.

[12]  Bahar Asgari,et al.  Capella: Customizing Perception for Edge Devices by Efficiently Allocating FPGAs to DNNs , 2019, 2019 29th International Conference on Field Programmable Logic and Applications (FPL).

[13]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[14]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[15]  Ashutosh Kumar Singh,et al.  Machine Learning for High-Throughput Stress Phenotyping in Plants. , 2016, Trends in plant science.

[16]  Sarmad Ullah Khan,et al.  Future Internet: The Internet of Things Architecture, Possible Applications and Key Challenges , 2012, 2012 10th International Conference on Frontiers of Information Technology.

[17]  In Lee,et al.  The Internet of Things (IoT): Applications, investments, and challenges for enterprises , 2015 .

[18]  Ramyad Hadidi,et al.  An Edge-Centric Scalable Intelligent Framework To Collaboratively Execute DNN , 2019 .

[19]  Crefeda Faviola Rodrigues,et al.  SyNERGY: An energy measurement and prediction framework for Convolutional Neural Networks on Jetson TX1 , 2018 .

[20]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[22]  Sudhakar Yalamanchili,et al.  Optimizing Data Warehousing Applications for GPUs Using Kernel Fusion/Fission , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[23]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[24]  Lin Zhong,et al.  RedEye: Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[25]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[26]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[27]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[28]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[29]  Alex Krizhevsky,et al.  One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[32]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[33]  Scott A. Mahlke,et al.  Scalpel: Customizing DNN pruning to the underlying hardware parallelism , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[34]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Haichen Shen,et al.  TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.

[36]  Wonyong Sung,et al.  Structured Pruning of Deep Convolutional Neural Networks , 2015, ACM J. Emerg. Technol. Comput. Syst..

[37]  Trevor N. Mudge,et al.  Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[38]  Michael S. Ryoo,et al.  Extreme Low Resolution Activity Recognition with Multi-Siamese Embedding Learning , 2017, AAAI.

[39]  Sudhakar Yalamanchili,et al.  ERIDANUS: Efficiently Running Inference of DNNs Using Systolic Arrays , 2019, IEEE Micro.

[40]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Roland Siegwart,et al.  From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[42]  Lida Xu,et al.  The internet of things: a survey , 2014, Information Systems Frontiers.

[43]  Dexmont Peña,et al.  Benchmarking of CNNs for Low-Cost , Low-Power Robotics Applications , 2010 .

[44]  Yu Cao,et al.  Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks , 2016, FPGA.

[45]  Yoshua Bengio,et al.  Training deep neural networks with low precision multiplications , 2014 .

[46]  Michael S. Ryoo,et al.  Musical Chair: Efficient Real-Time Recognition Using Collaborative IoT Devices , 2018, ArXiv.

[47]  Michael S. Ryoo,et al.  Real-Time Image Recognition Using Collaborative IoT Devices , 2018, ReQuEST@ASPLOS.

[48]  Yiran Chen,et al.  MoDNN: Local distributed mobile computing system for Deep Neural Network , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[49]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[50]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[51]  Huimin Lu,et al.  Motor Anomaly Detection for Unmanned Aerial Vehicles Using Reinforcement Learning , 2018, IEEE Internet of Things Journal.

[52]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[53]  Ramyad Hadidi,et al.  Characterizing the Execution of Deep Neural Networks on Collaborative Robots and Edge Devices , 2019, PEARC.

[54]  Philip Heng Wai Leong,et al.  FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[55]  Gu-Yeon Wei,et al.  Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[56]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[57]  Song Han,et al.  ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.

[58]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[59]  김종영 구글 TensorFlow 소개 , 2015 .

[60]  Louis B. Rall,et al.  Automatic Differentiation: Techniques and Applications , 1981, Lecture Notes in Computer Science.

[61]  Lina Yao,et al.  Deep Learning Based Recommender System , 2017, ACM Comput. Surv..

[62]  Xin Wang,et al.  Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks , 2017, NIPS.

[63]  Vincent Vanhoucke,et al.  Improving the speed of neural networks on CPUs , 2011 .

[64]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[65]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[66]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[67]  Paolo Napoletano,et al.  Benchmark Analysis of Representative Deep Neural Network Architectures , 2018, IEEE Access.

[68]  Joel Emer,et al.  Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[69]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[70]  Han Jie,et al.  基于NVIDIA Jetson TX2的道路场景分割 , 2018 .

[71]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[72]  Jiwen Lu,et al.  Runtime Neural Pruning , 2017, NIPS.

[73]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[74]  Kilian Q. Weinberger,et al.  CondenseNet: An Efficient DenseNet Using Learned Group Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[75]  Matti Siekkinen,et al.  Latency and throughput characterization of convolutional neural networks for mobile computer vision , 2018, MMSys.

[76]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[77]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[78]  Asit K. Mishra,et al.  From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[79]  Mohan M. Trivedi,et al.  Looking at Humans in the Age of Self-Driving and Highly Automated Vehicles , 2016, IEEE Transactions on Intelligent Vehicles.

[80]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[81]  Michael S. Ryoo,et al.  Distributed Perception by Collaborative Robots , 2018, IEEE Robotics and Automation Letters.

[82]  Matthew L. Merck,et al.  Understanding the Power Consumption of Executing Deep Neural Networks on a Distributed Robot System , 2019 .

[83]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.