Accurate, Low-Latency Visual Perception for Autonomous Racing: Challenges, Mechanisms, and Practical Solutions

Autonomous racing provides the opportunity to test safety-critical perception pipelines at their limit. This paper describes the practical challenges and solutions to applying state-of-the-art computer vision algorithms to build a low-latency, high-accuracy perception system for DUT18 Driverless (DUT18D), a 4WD electric race car with podium finishes at all Formula Driverless competitions for which it raced. The key components of DUT18D include YOLOv3-based object detection, pose estimation, and time synchronization on its dual stereovision/monovision camera setup. We highlight modifications required to adapt perception CNNs to racing domains, improvements to loss functions used for pose estimation, and methodologies for sub-microsecond camera synchronization among other improvements. We perform a thorough experimental evaluation of the system, demonstrating its accuracy and low-latency in real-world racing scenarios.

[1]  Sinisa Todorovic,et al.  Monocular Depth Estimation Using Neural Regression Forest , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Antonio Bicchi,et al.  Towards the Design of Robotic Drivers for Full-Scale Self-Driving Racing Cars , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[3]  Heechul Yun,et al.  DeepPicar: A Low-Cost Deep Neural Network-Based Autonomous Car , 2017, 2018 IEEE 24th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA).

[4]  E. William Yund,et al.  Factors influencing the latency of simple reaction time , 2015, Front. Hum. Neurosci..

[5]  Zheng Luo,et al.  Driving Scene Perception Network: Real-Time Joint Detection, Depth Estimation and Semantic Segmentation , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Sertac Karaman,et al.  FastDepth: Fast Monocular Depth Estimation on Embedded Systems , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[8]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[9]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[10]  Ian D. Reid,et al.  Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Ankit Dhall Real-time 3D Pose Estimation with a Monocular Camera Using Deep Learning and Object Priors On an Autonomous Racecar , 2018, ArXiv.

[12]  Geoffrey C. Fox,et al.  Real-Time, Cloud-Based Object Detection for Unmanned Aerial Vehicles , 2017, 2017 First IEEE International Conference on Robotic Computing (IRC).

[13]  Skanda Koppula Learning a CNN-based End-to-End Controller for a Formula SAE Racecar , 2017, ArXiv.

[14]  Alexander Liniger,et al.  Learning-Based Model Predictive Control for Autonomous Racing , 2019, IEEE Robotics and Automation Letters.

[15]  Ashish Kapoor,et al.  Explorations and Lessons Learned in Building an Autonomous Formula SAE Car from Simulations , 2019, SIMULTECH.

[16]  Liang Lin,et al.  Is Faster R-CNN Doing Well for Pedestrian Detection? , 2016, ECCV.

[17]  Wolfram Burgard,et al.  Fast and accurate SLAM with Rao-Blackwellized particle filters , 2007, Robotics Auton. Syst..

[18]  Renaud Dubé,et al.  AMZ Driverless: The full autonomous racing system , 2019, J. Field Robotics.

[19]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Renaud Dubé,et al.  Redundant Perception and State Estimation for Reliable Autonomous Racing , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[21]  Jun Ni,et al.  Path following control for autonomous formula racecar: Autonomous formula student competition , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[22]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[23]  Sebastian Thrun,et al.  Stanley: The robot that won the DARPA Grand Challenge , 2006, J. Field Robotics.

[24]  Paolo Valigi,et al.  Fast robust monocular depth estimation for Obstacle Detection with fully convolutional networks , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Christos-Savvas Bouganis,et al.  DroNet: Efficient convolutional neural network detector for real-time UAV applications , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[26]  Alexander Hofmann,et al.  Design of an Autonomous Race Car for the Formula Student Driverless ( FSD ) , 2017 .

[27]  Xiaowei Zhou,et al.  Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Seyed Majid Azimi,et al.  ShuffleDet: Real-Time Vehicle Detection Network in On-board Embedded UAV Imagery , 2018, ECCV Workshops.

[29]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[30]  Ming Liu,et al.  See the Future: A Semantic Segmentation Network Predicting Ego-Vehicle Trajectory With a Single Monocular Camera , 2020, IEEE Robotics and Automation Letters.

[31]  Sebastian Thrun,et al.  FastSLAM 2.0: an improved particle filtering algorithm for simultaneous localization and mapping that provably converges , 2003, IJCAI 2003.

[32]  Yong-Sheng Chen,et al.  Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Forrest N. Iandola,et al.  SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[34]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Zhen He,et al.  Numerical Coordinate Regression with Convolutional Neural Networks , 2018, ArXiv.

[36]  Luc Van Gool,et al.  Real-time 3D Traffic Cone Detection for Autonomous Driving , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[37]  Christoph Hermes,et al.  3D pose estimation of vehicles using a stereo camera , 2009, 2009 IEEE Intelligent Vehicles Symposium.

[38]  Renaud Dubé,et al.  Design of an Autonomous Racecar: Perception, State Estimation and System Integration , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[39]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[40]  Dacheng Tao,et al.  Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  N. Kaempchen,et al.  Stereo vision based pose estimation of parking lots using 3D vehicle models , 2002, Intelligent Vehicle Symposium, 2002. IEEE.

[42]  Senthil Yogamani,et al.  Real-time Joint Object Detection and Semantic Segmentation Network for Automated Driving , 2019, ArXiv.

[43]  Davide Scaramuzza,et al.  How Fast Is Too Fast? The Role of Perception Latency in High-Speed Sense and Avoid , 2019, IEEE Robotics and Automation Letters.

[44]  Bin Yang,et al.  Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Ashutosh Saxena,et al.  Depth Estimation Using Monocular and Stereo Cues , 2007, IJCAI.