The Architectural Implications of Autonomous Driving: Constraints and Acceleration

Autonomous driving systems have attracted a significant amount of interest recently, and many industry leaders, such as Google, Uber, Tesla, and Mobileye, have invested a large amount of capital and engineering power on developing such systems. Building autonomous driving systems is particularly challenging due to stringent performance requirements in terms of both making the safe operational decisions and finishing processing at real-time. Despite the recent advancements in technology, such systems are still largely under experimentation and architecting end-to-end autonomous driving systems remains an open research question. To investigate this question, we first present and formalize the design constraints for building an autonomous driving system in terms of performance, predictability, storage, thermal and power. We then build an end-to-end autonomous driving system using state-of-the-art award-winning algorithms to understand the design trade-offs for building such systems. In our real-system characterization, we identify three computational bottlenecks, which conventional multicore CPUs are incapable of processing under the identified design constraints. To meet these constraints, we accelerate these algorithms using three accelerator platforms including GPUs, FPGAs, and ASICs, which can reduce the tail latency of the system by 169x, 10x, and 93x respectively. With accelerator-based designs, we are able to build an end-to-end autonomous driving system that meets all the design constraints, and explore the trade-offs among performance, power and the higher accuracy enabled by higher resolution cameras.

[1]  R.W. Johnson,et al.  The changing automotive environment: high-temperature electronics , 2004, IEEE Transactions on Electronics Packaging Manufacturing.

[2]  D.J. Perreault,et al.  Automotive power generation and control , 2004, IEEE Transactions on Power Electronics.

[3]  Silvio Savarese,et al.  Learning to Track at 100 FPS with Deep Regression Networks , 2016, ECCV.

[4]  Natalie D. Enright Jerger,et al.  Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[5]  Daniel V. McGehee,et al.  Driver Reaction Time in Crash Avoidance Research: Validation of a Driving Simulator Study on a Test Track , 2000 .

[6]  Forrest N. Iandola,et al.  Shallow Networks for High-accuracy Road Object-detection , 2016, VEHITS.

[7]  Ryan M. Eustice,et al.  Visual localization within LIDAR maps for automated urban driving , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Ji Zhang,et al.  Visual-lidar odometry and mapping: low-drift, robust, and fast , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Quan Chen,et al.  DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[10]  Lei Zhang,et al.  Neuromorphic accelerators: A comparison between neuroscience and machine-learning approaches , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[11]  Allen Newell,et al.  The Prospects for Psychological Science in Human-Computer Interaction , 1985, Hum. Comput. Interact..

[12]  Y. C. Yeh,et al.  Triple-triple redundant 777 primary flight computer , 1996, 1996 IEEE Aerospace Applications Conference. Proceedings.

[13]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[15]  M. A. Fayazbakhsh,et al.  Comprehensive Modeling of Vehicle Air Conditioning Loads Using Heat Balance Method , 2013 .

[16]  Ryan M. Eustice,et al.  Fast LIDAR localization using multiresolution Gaussian mixture maps , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Matthias Finkbeiner,et al.  Life cycle approach to sustainability assessment: a case study of remanufactured alternators , 2012 .

[18]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[19]  Scott A. Mahlke,et al.  Concise loads and stores: The case for an asymmetric compute-memory architecture for approximation , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[20]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[21]  Jason Cong,et al.  Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[22]  Christina Delimitrou,et al.  DRAF: A Low-Power DRAM-Based Reconfigurable Acceleration Fabric , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[23]  Jin-Woo Lee,et al.  Motion planning for autonomous driving with a conformal spatiotemporal lattice , 2011, 2011 IEEE International Conference on Robotics and Automation.

[24]  Xuehai Zhou,et al.  PuDianNao: A Polyvalent Machine Learning Accelerator , 2015, ASPLOS.

[25]  Natalie D. Enright Jerger,et al.  Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[26]  Christian Berger,et al.  Autonomous Driving-5 Years after the Urban Challenge: The Anticipatory Vehicle as a Cyber-Physical System , 2014, GI-Jahrestagung.

[27]  R. Farrington,et al.  IMPACT OF VEHICLE AIR-CONDITIONING ON FUEL ECONOMY. TAILPIPE EMISSIONS, AND ELECTRIC VEHICLE RANGE: PREPRINT , 2000 .

[28]  Sudhakar Yalamanchili,et al.  Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[29]  Tao Zhang,et al.  PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[30]  Ronald G. Dreslinski,et al.  Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers , 2015, ASPLOS.

[31]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[32]  Scott A. Mahlke,et al.  DeftNN: Addressing Bottlenecks for DNN Execution on GPUs via Synapse Vector Elimination and Near-compute Data Fission , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[33]  Denis Fize,et al.  Speed of processing in the human visual system , 1996, Nature.

[34]  Thomas Schamm,et al.  Autonomous driving , 2015, it Inf. Technol..

[35]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[36]  Amnon Shashua,et al.  Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.

[37]  Patrick Judd,et al.  Stripes: Bit-serial deep neural network computing , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[38]  Mehrdad Ehsani,et al.  Current status and future trends in More Electric Car power systems , 1999, 1999 IEEE 49th Vehicular Technology Conference (Cat. No.99CH36363).

[39]  L. Miles,et al.  2000 , 2000, RDH.

[40]  Ross A. Knepper,et al.  Differentially constrained mobile robot motion planning in state lattices , 2009 .

[41]  Shinpei Kato,et al.  An Open Approach to Autonomous Vehicles , 2015, IEEE Micro.

[42]  Pramodita Sharma 2012 , 2013, Les 25 ans de l’OMC: Une rétrospective en photos.

[43]  Hadi Esmaeilzadeh,et al.  TABLA: A unified template-based framework for accelerating statistical machine learning , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[44]  Ninghui Sun,et al.  DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning , 2014, ASPLOS.

[45]  Sebastian Thrun,et al.  Map-Based Precision Vehicle Localization in Urban Environments , 2007, Robotics: Science and Systems.

[46]  G. Johansson,et al.  Drivers' Brake Reaction Times , 1971, Human factors.

[47]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[49]  Hairi Zamzuri,et al.  Modelling and Control Strategies in Path Tracking Control for Autonomous Ground Vehicles: A Review of State of the Art and Challenges , 2017, J. Intell. Robotic Syst..

[50]  Paul Newman,et al.  LAPS - localisation using appearance of prior structure: 6-DoF monocular camera localisation using prior pointclouds , 2012, 2012 IEEE International Conference on Robotics and Automation.

[51]  Khalid A. Joudi,et al.  Experimental and computer performance study of an automotive air conditioning system with alternative refrigerants , 2003 .

[52]  S. M. García,et al.  2014: , 2020, A Party for Lazarus.

[53]  Paul Newman,et al.  Generation and exploitation of local orthographic imagery for road vehicle localisation , 2012, 2012 IEEE Intelligent Vehicles Symposium.

[54]  Natalia Gimelshein,et al.  vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[55]  Paul Newman,et al.  Distraction suppression for vision-based pose estimation at city scales , 2013, 2013 IEEE International Conference on Robotics and Automation.

[56]  Asit K. Mishra,et al.  From high-level deep neural models to FPGAs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[57]  Tianshi Chen,et al.  ShiDianNao: Shifting vision processing closer to the sensor , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[58]  Joel Emer,et al.  Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.

[59]  Ross A. Knepper,et al.  Differentially constrained mobile robot motion planning in state lattices , 2009, J. Field Robotics.

[60]  Jose-Maria Arnau,et al.  An ultra low-power hardware accelerator for automatic speech recognition , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[61]  Wenguang Chen,et al.  NEUTRAMS: Neural network transformation and co-design under neuromorphic hardware constraints , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[62]  William Whittaker,et al.  Autonomous driving in urban environments: Boss and the Urban Challenge , 2008, J. Field Robotics.