A Survey on optimized implementation of deep learning models on the NVIDIA Jetson platform

Abstract Design of hardware accelerators for neural network (NN) applications involves walking a tight rope amidst the constraints of low-power, high accuracy and throughput. NVIDIA’s Jetson is a promising platform for embedded machine learning which seeks to achieve a balance between the above objectives. In this paper, we provide a survey of works that evaluate and optimize neural network applications on Jetson platform. We review both hardware and algorithmic optimizations performed for running NN algorithms on Jetson and show the real-life applications where these algorithms have been applied. We also review the works that compare Jetson with similar platforms. While the survey focuses on Jetson as an exemplar embedded system, many of the ideas and optimizations will apply just as well to existing and future embedded systems. It is widely believed that the ability to run AI algorithms on low-cost, low-power platforms will be crucial for achieving the “AI for all” vision. This survey seeks to provide a glimpse of the recent progress towards that goal.

[1]  Sen Cao,et al.  Detecting The Objects on The Road Using Modular Lightweight Network , 2018, ArXiv.

[2]  Hatice Gunes,et al.  SmileNet: Registration-Free Smiling Face Detection In The Wild , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[3]  Hanung Adi Nugroho,et al.  Deep learning-based Diabetic Retinopathy assessment on embedded system , 2017, 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[4]  Rita Cucchiara,et al.  Embedded recurrent network for head pose estimation in car , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[5]  Tinoosh Mohsenin,et al.  Wearable seizure detection using convolutional neural networks with transfer learning , 2016, 2016 IEEE International Symposium on Circuits and Systems (ISCAS).

[6]  Stefano Tubaro,et al.  Deep Convolutional Neural Networks for pedestrian detection , 2015, Signal Process. Image Commun..

[7]  Yun Chen,et al.  Accurate and fast obstacle detection method for automotive applications based on stereo vision , 2018, 2018 International Symposium on VLSI Design, Automation and Test (VLSI-DAT).

[8]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[9]  Susanto,et al.  The deep learning development for real-time ball and goal detection of barelang-FC , 2017, 2017 International Electronics Symposium on Engineering Technology and Applications (IES-ETA).

[10]  Yun Liang,et al.  Enabling high performance deep learning networks on embedded systems , 2017, IECON 2017 - 43rd Annual Conference of the IEEE Industrial Electronics Society.

[11]  Ye-Hoon Kim,et al.  Real-Time Driver Drowsiness Detection for Embedded System Using Model Compression of Deep Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[12]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[13]  Sean R. Anderson,et al.  Compact Deep Neural Networks for Computationally Efficient Gesture Classification From Electromyography Signals , 2018, 2018 7th IEEE International Conference on Biomedical Robotics and Biomechatronics (Biorob).

[14]  Steven Bohez,et al.  The cascading neural network: building the Internet of Smart Things , 2017, Knowledge and Information Systems.

[15]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[16]  Jae Wook Jeon,et al.  Evaluation of Embedded Systems for Automotive Image Processing , 2018, 2018 19th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD).

[17]  Sparsh Mittal,et al.  Power Management Techniques for Data Centers: A Survey , 2014, ArXiv.

[18]  Naim Dahnoun,et al.  Real-time stereo vision for collision detection on autonomous UAVs , 2017, 2017 IEEE International Conference on Imaging Systems and Techniques (IST).

[19]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Qi Li,et al.  TPCAM: Real-time traffic pattern collection and analysis model based on deep learning , 2017, 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).

[21]  Qing Wu,et al.  AnRAD: A Neuromorphic Anomaly Detection Framework for Massive Concurrent Data Streams , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Ming Yang,et al.  An Evaluation of the NVIDIA TX1 for Supporting Real-Time Computer-Vision Workloads , 2017, 2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[23]  Juan Li,et al.  Real-World Railway Traffic Detection Based on Faster Better Network , 2018, IEEE Access.

[24]  Zhuo Chen,et al.  Bandwidth-Efficient Live Video Analytics for Drones Via Edge Computing , 2018, 2018 IEEE/ACM Symposium on Edge Computing (SEC).

[25]  Mennatullah Siam,et al.  ShuffleSeg: Real-time Semantic Segmentation Network , 2018, ArXiv.

[26]  Ting Chen,et al.  Autonomous Near Ground Quadrone Navigation with Uncalibrated Spherical Images Using Convolutional Neural Networks , 2016, MoMM.

[27]  Yan Wang,et al.  Anytime Stereo Image Depth Estimation on Mobile Devices , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[28]  Tim Oates,et al.  SensorNet: A Scalable and Low-Power Deep Convolutional Neural Network for Multimodal Data Classification , 2019, IEEE Transactions on Circuits and Systems I: Regular Papers.

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Hyunwoo Lee,et al.  Embedded Real-Time Fall Detection Using Deep Learning For Elderly Care , 2017, ArXiv.

[31]  Sparsh Mittal,et al.  A Survey of Techniques for Architecting and Managing Asymmetric Multicore Processors , 2016, ACM Comput. Surv..

[32]  Michele Magno,et al.  Accelerating real-time embedded scene labeling with convolutional networks , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[33]  Roland Siegwart,et al.  weedNet: Dense Semantic Weed Classification Using Multispectral Images and MAV for Smart Farming , 2017, IEEE Robotics and Automation Letters.

[34]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[35]  Yanning Zhang,et al.  Hybrid Camera Array-Based UAV Auto-Landing on Moving UGV in GPS-Denied Environment , 2018, Remote. Sens..

[36]  Wei Zeng,et al.  A Deep Learning Tennis Ball Collection Robot and the Implementation on NVIDIA Jetson TX1 Board , 2018, 2018 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM).

[37]  David Hyunchul Shim,et al.  Perception, Guidance, and Navigation for Indoor Autonomous Drone Racing Using Deep Learning , 2018, IEEE Robotics and Automation Letters.

[38]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[39]  Shu-Ching Chen,et al.  Reduced Residual Nets (Red-Nets): Low Powered Adversarial Outlier Detectors , 2018, 2018 IEEE International Conference on Information Reuse and Integration (IRI).

[40]  Yifan Wang,et al.  pCAMP: Performance Comparison of Machine Learning Packages on the Edges , 2019, HotEdge.

[41]  Jose-Maria Arnau,et al.  Low-Power Automatic Speech Recognition Through a Mobile GPU and a Viterbi Accelerator , 2017, IEEE Micro.

[42]  Jan Kautz,et al.  Learning Affinity via Spatial Propagation Networks , 2017, NIPS.

[43]  Gui-Song Xia,et al.  Toward Autonomous Rotation-Aware Unmanned Aerial Grasping , 2018, ArXiv.

[44]  Amlaan Shakeel Service robot for the visually impaired: Providing navigational assistance using Deep Learning , 2017 .

[45]  Wen Gao,et al.  Fast MPEG-CDVS Encoder With GPU-CPU Hybrid Computing , 2017, IEEE Transactions on Image Processing.

[46]  Paul Rad,et al.  A Privacy-Aware Architecture at the Edge for Autonomous Real-Time Identity Reidentification in Crowds , 2018, IEEE Internet of Things Journal.

[47]  Sparsh Mittal,et al.  A Survey of ReRAM-Based Architectures for Processing-In-Memory and Neural Networks , 2018, Mach. Learn. Knowl. Extr..

[48]  B. S. Manjunath,et al.  Are Very Deep Neural Networks Feasible on Mobile Devices , 2016 .

[49]  Sparsh Mittal,et al.  A Survey of Techniques for Architecting and Managing GPU Register File , 2017, IEEE Transactions on Parallel and Distributed Systems.

[50]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[51]  Yiannis Aloimonos,et al.  GapFlyt: Active Vision Based Minimalist Structure-Less Gap Detection For Quadrotor Flight , 2018, IEEE Robotics and Automation Letters.

[52]  Bryan Reimer,et al.  MIT Autonomous Vehicle Technology Study: Large-Scale Deep Learning Based Analysis of Driver Behavior and Interaction with Automation , 2017 .

[53]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[54]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Yiyu Shi,et al.  DAC-SDC Low Power Object Detection Challenge for UAV Applications , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Farinaz Koushanfar,et al.  DeLight: Adding Energy Dimension To Deep Neural Networks , 2016, ISLPED.

[57]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Krishna M. Kavi,et al.  Exploring the Processing-in-Memory design space , 2017, J. Syst. Archit..

[59]  Xiaowei Li,et al.  Dadu: Accelerating Inverse Kinematics for high-DOF robots , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[60]  Sparsh Mittal,et al.  A survey of spintronic architectures for processing-in-memory and neural networks , 2019, J. Syst. Archit..

[61]  Sparsh Mittal,et al.  A survey of architectural techniques for improving cache power efficiency , 2014, Sustain. Comput. Informatics Syst..

[62]  Bruno Volckaert,et al.  Embedded Real-Time Object Detection for a UAV Warning System , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[63]  Vlad Paunescu,et al.  Scene Understanding Networks for Autonomous Driving Based on Around View Monitoring System , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[64]  Cyrill Stachniss,et al.  Classifying Obstacles and Exploiting Knowledge About Classes for Efficient Humanoid Navigation , 2018, 2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids).

[65]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Nachiket Kapre,et al.  CaffePresso: Accelerating Convolutional Networks on Embedded SoCs , 2017, ACM Trans. Embed. Comput. Syst..

[67]  Joshua M. Kaster,et al.  Convolutional neural networks on small unmanned aerial systems , 2017, 2017 IEEE National Aerospace and Electronics Conference (NAECON).

[68]  Shih-Hao Hung,et al.  Fast profiling framework and race detection for heterogeneous system , 2017, J. Syst. Archit..

[69]  Michael S. Ryoo,et al.  Musical Chair: Efficient Real-Time Recognition Using Collaborative IoT Devices , 2018, ArXiv.

[70]  Soonhoi Ha,et al.  C-GOOD: C-code Generation Framework for Optimized On-device Deep Learning , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[71]  Tinoosh Mohsenin,et al.  Accelerating Convolutional Neural Network With FFT on Embedded Hardware , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[72]  Geng Yan,et al.  Real-time Detection, Tracking, and Classification of Moving and Stationary Objects using Multiple Fisheye Images , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[73]  Rachel Huang,et al.  YOLO-LITE: A Real-Time Object Detection Algorithm Optimized for Non-GPU Computers , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[74]  Tinoosh Mohsenin,et al.  Embedded Low-Power Processor for Personalized Stress Detection , 2018, IEEE Transactions on Circuits and Systems II: Express Briefs.

[75]  Roland Siegwart,et al.  Leveraging Deep Visual Descriptors for Hierarchical Efficient Localization , 2018, CoRL.

[76]  Jeffrey S. Vetter,et al.  A Survey of Methods for Analyzing and Improving GPU Energy Efficiency , 2014, ACM Comput. Surv..

[77]  Nikolai Smolyanskiy,et al.  Toward low-flying autonomous MAV trail navigation using deep neural networks for environmental awareness , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[78]  Cheng Wang,et al.  CNN-based object detection solutions for embedded heterogeneous multicore SoCs , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[79]  Naveen Kumar,et al.  An Edge Based Smart Parking Solution Using Camera Networks and Deep Learning , 2018, 2018 IEEE International Conference on Cognitive Computing (ICCC).

[80]  Jeffrey S. Vetter,et al.  A Survey of CPU-GPU Heterogeneous Computing Techniques , 2015, ACM Comput. Surv..

[81]  John M. Pierre,et al.  Spatio-temporal deep learning for robotic visuomotor control , 2018, 2018 4th International Conference on Control, Automation and Robotics (ICCAR).

[82]  Yehia El-khatib,et al.  Adaptive deep learning model selection on embedded systems , 2018, LCTES.

[83]  Chen Liu,et al.  IoT Edge Device Based Key Frame Extraction for Face in Video Recognition , 2018, 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).

[84]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[85]  Yu Sun,et al.  Automatic in-trap pest detection using deep learning for pheromone-based Dendroctonus valens monitoring , 2018, Biosystems Engineering.

[86]  Yun Liang,et al.  Exploiting Sparsity to Accelerate Fully Connected Layers of CNN-Based Applications on Mobile SoCs , 2018, ACM Trans. Embed. Comput. Syst..

[87]  Paolo Remagnino,et al.  A Comparison of Embedded Deep Learning Methods for Person Detection , 2018, VISIGRAPP.

[88]  Seyed Majid Azimi,et al.  ShuffleDet: Real-Time Vehicle Detection Network in On-board Embedded UAV Imagery , 2018, ECCV Workshops.

[89]  Heikki Huttunen,et al.  Embedded Implementation of a Deep Learning Smile Detector , 2018, 2018 7th European Workshop on Visual Information Processing (EUVIP).

[90]  Srikanth Saripalli,et al.  Drone Detection Using Depth Maps , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[91]  Sparsh Mittal,et al.  A survey of techniques for improving energy efficiency in embedded computing systems , 2014, Int. J. Comput. Aided Eng. Technol..

[92]  Sparsh Mittal,et al.  A survey of encoding techniques for reducing data-movement energy , 2019, J. Syst. Archit..

[93]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[94]  Weiqiang Ren,et al.  LaneNet: Real-Time Lane Detection Networks for Autonomous Driving , 2018, ArXiv.

[95]  Sparsh Mittal,et al.  A survey of FPGA-based accelerators for convolutional neural networks , 2018, Neural Computing and Applications.

[96]  Xiao Zeng,et al.  MobileDeepPill: A Small-Footprint Mobile Deep Learning System for Recognizing Unconstrained Pill Images , 2017, MobiSys.

[97]  Sparsh Mittal,et al.  A Survey of Techniques for Approximate Computing , 2016, ACM Comput. Surv..

[98]  Ahmed Hemani,et al.  SPEED: Open-Source Framework to Accelerate Speech Recognition on Embedded GPUs , 2017, 2017 Euromicro Conference on Digital System Design (DSD).

[99]  Dongkyu Lee,et al.  The Real-time Implementation for the Parking Line Departure Warning System , 2018, 2018 3rd IEEE International Conference on Intelligent Transportation Engineering (ICITE).

[100]  Manu Goyal,et al.  Robust Methods for Real-Time Diabetic Foot Ulcer Detection and Localization on Mobile Devices , 2019, IEEE Journal of Biomedical and Health Informatics.

[101]  Junhao Xiao,et al.  Robot detection and localization based on deep learning , 2017, 2017 Chinese Automation Congress (CAC).

[102]  Pier Stanislao Paolucci,et al.  Power, Energy and Speed of Embedded and Server Multi-Cores applied to Distributed Simulation of Spiking Neural Networks: ARM in NVIDIA Tegra vs Intel Xeon quad-cores , 2015, ArXiv.

[103]  Heechul Yun,et al.  DeepPicar: A Low-Cost Deep Neural Network-Based Autonomous Car , 2017, 2018 IEEE 24th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA).

[104]  Luca Benini,et al.  CBinfer: Change-Based Inference for Convolutional Neural Networks on Video Data , 2017, ICDSC.

[105]  Sebastian Scherer,et al.  Wire detection using synthetic data and dilated convolutional networks for unmanned aerial vehicles , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[106]  Jie Lin,et al.  TEA-DNN: the Quest for Time-Energy-Accuracy Co-optimized Deep Neural Networks , 2018, 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[107]  David C. Anastasiu,et al.  Robust classification of city roadway objects for traffic related applications , 2017, 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).

[108]  Stephan K. Chalup,et al.  Comparing Computing Platforms for Deep Learning on a Humanoid Robot , 2018, ICONIP.

[109]  V. V. Sajith Variyar,et al.  Obstacle classification and detection for vision based navigation for autonomous driving , 2017, 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[110]  Jeffrey S. Vetter,et al.  A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems , 2016 .

[111]  Cecilia Mascolo,et al.  Surveying Areas in Developing Regions Through Context Aware Drone Mobility , 2018, DroNet@MobiSys.

[112]  Gregory Dudek,et al.  Vision-Based Autonomous Underwater Swimming in Dense Coral for Combined Collision Avoidance and Target Selection , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[113]  George K. Thiruvathukal,et al.  2018 Low-Power Image Recognition Challenge , 2018, ArXiv.

[114]  Soonhoi Ha,et al.  Joint optimization of speed, accuracy, and energy for embedded image recognition systems , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).