DeepPicar: A Low-Cost Deep Neural Network-Based Autonomous Car

We present DeepPicar, a low-cost deep neural network based autonomous car platform. DeepPicar is a small scale replication of a real self-driving car called DAVE-2 by NVIDIA. DAVE-2 uses a deep convolutional neural network (CNN), which takes images from a front-facing camera as input and produces car steering angles as output. DeepPicar uses the same network architecture—9 layers, 27 million connections and 250K parameters—and can drive itself in real-time using a web camera and a Raspberry Pi 3 quad-core platform. Using DeepPicar, we analyze the Pi 3's computing capabilities to support end-to-end deep learning based real-time control of autonomous vehicles. We also systematically compare other contemporary embedded computing platforms using the DeepPicar's CNN-based real-time control workload. We find that all tested platforms, including the Pi 3, are capable of supporting the CNN-based real-time control, from 20 Hz up to 100 Hz, depending on hardware platform. However, we find that shared resource contention remains an important issue that must be considered in applying CNN models on shared memory based embedded computing platforms; we observe up to 11.6X execution time increase in the CNN based control loop due to shared resource contention. To protect the CNN workload, we also evaluate state-of-the-art cache partitioning and memory bandwidth throttling techniques on the Pi 3. We find that cache partitioning is ineffective, while memory bandwidth throttling is an effective solution.

[1]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[2]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[3]  Jingyi Zhang,et al.  MARACAS: A Real-Time Multicore VCPU Scheduling Framework , 2016, 2016 IEEE Real-Time Systems Symposium (RTSS).

[4]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[5]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[6]  Karl S. Hemmert,et al.  Embedded floating-point units in FPGAs , 2006, FPGA '06.

[7]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[8]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[9]  Ron Meir,et al.  Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights , 2014, NIPS.

[10]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Ming Yang,et al.  An Evaluation of the NVIDIA TX1 for Supporting Real-Time Computer-Vision Workloads , 2017, 2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[12]  Yuxi Li,et al.  Deep Reinforcement Learning , 2018, Reinforcement Learning for Cyber-Physical Systems.

[13]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[14]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, ArXiv.

[15]  Heechul Yun,et al.  Taming Non-Blocking Caches to Improve Isolation in Multicore Real-Time Systems , 2016, 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[16]  James H. Anderson,et al.  Attacking the one-out-of-m multicore problem by combining hardware management with mixed-criticality provisioning , 2016, 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[17]  Sertac Karaman,et al.  Project-based, collaborative, algorithmic robotics for high school students: Programming self-driving race cars at MIT , 2017, 2017 IEEE Integrated STEM Education Conference (ISEC).

[18]  Lui Sha,et al.  MemGuard: Memory bandwidth reservation system for efficient performance isolation in multi-core platforms , 2013, 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[19]  GPU-Based Deep Learning Inference: A Performance and Power Analysis , 2015 .

[20]  Rodolfo Pellizzoni,et al.  A Survey on Cache Management Mechanisms for Real-Time Embedded Systems , 2015, ACM Comput. Surv..

[21]  Viktor K. Prasanna,et al.  Analysis of high-performance floating-point arithmetic on FPGAs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[22]  Kyuyeon Hwang,et al.  Fixed-point feedforward deep neural network design using weights +1, 0, and −1 , 2014, 2014 IEEE Workshop on Signal Processing Systems (SiPS).

[23]  Rodolfo Pellizzoni,et al.  PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms , 2014, 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[24]  Minje Kim,et al.  XNOR-POP: A processing-in-memory architecture for binary Convolutional Neural Networks in Wide-IO2 DRAMs , 2017, 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED).

[25]  Paris Smaragdis,et al.  Bitwise Neural Networks , 2016, ArXiv.

[26]  Rodolfo Pellizzoni,et al.  A Requests Bundling DRAM Controller for Mixed-Criticality Systems , 2017, 2017 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS).

[27]  Waqar Ali,et al.  Protecting Real-Time GPU Applications on Integrated CPU-GPU SoC Platforms , 2017, ArXiv.