Autonomy 2.0: Why is self-driving always 5 years away?

Despite the numerous successes of AI over the past decade in image recog1 nition, decision-making, NLP and image synthesis, self-driving technology has not 2 yet followed the same trend. In this paper, we study the history, composition, and 3 development bottlenecks of the modern self-driving stacks adopted across the industry. 4 We argue that the slow progress is caused by taking approaches that require too much 5 hand-engineering, an over-reliance on road testing, and high fleet deployment costs. We 6 observe that the classical stack has several bottlenecks that preclude the necessary scale 7 needed to capture the long tail of rare events. To resolve these problems, we outline the 8 principles of Autonomy 2.0, an ML-first approach to self-driving, as a viable alternative 9 to the currently adopted state-of-the-art. This approach is based on (i) a fully differ10 entiable AV stack trainable from human demonstrations, (ii) closed-loop data-driven 11 reactive simulation, and (iii) large-scale, low-cost data collections as critical solutions to12 wards scalability issues. We outline the general architecture, survey promising works in 13 this direction and propose key challenges to be addressed by the community in the future. 14

[1]  Sammy Omari,et al.  One Thousand and One Hours: Self-driving Motion Prediction Dataset , 2020, CoRL.

[2]  Oliver Scheel,et al.  What data do we need for training an AV motion planner? , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Silvio Savarese,et al.  SoPhie: An Attentive GAN for Predicting Paths Compliant to Social and Physical Constraints , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Sergey Levine,et al.  Contingencies from Observations: Tractable Contingency Planning with Learned Behavior Models , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Michael Meyer,et al.  Deep Learning Based 3D Object Detection for Automotive Radar and Camera , 2019, 2019 16th European Radar Conference (EuRAD).

[6]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[7]  Sergey Levine,et al.  Causal Confusion in Imitation Learning , 2019, NeurIPS.

[8]  Ling Shao,et al.  M3DSSD: Monocular 3D Single Stage Object Detector , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  B. V. K. Vijaya Kumar,et al.  A multi-sensor fusion system for moving object detection and tracking in urban driving environments , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Dirk Haehnel,et al.  Junior: The Stanford entry in the Urban Challenge , 2008 .

[11]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[12]  Jelena Kocić,et al.  Sensors and Sensor Fusion in Autonomous Vehicles , 2018, 2018 26th Telecommunications Forum (TELFOR).

[13]  Oliver Scheel,et al.  SimNet: Learning Reactive Self-driving Simulations from Real-world Observations , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Sebastian Thrun,et al.  Junior: The Stanford entry in the Urban Challenge , 2008, J. Field Robotics.

[15]  Emilio Frazzoli,et al.  Intention-Aware Motion Planning , 2013, WAFR.

[16]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[17]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[18]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[19]  Philip H. S. Torr,et al.  DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Raquel Urtasun,et al.  TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Nassir Navab,et al.  SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[23]  Benjamin Sapp,et al.  MultiPath: Multiple Probabilistic Anchor Trajectory Hypotheses for Behavior Prediction , 2019, CoRL.

[24]  Xiaoyong Shen,et al.  DSGN: Deep Stereo Geometry Network for 3D Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jiong Yang,et al.  PointPillars: Fast Encoders for Object Detection From Point Clouds , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Yan Wang,et al.  Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[28]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Sanja Fidler,et al.  Learning to Evaluate Perception Models Using Planner-Centric Metrics , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[32]  Steven M. LaValle,et al.  RRT-connect: An efficient approach to single-query path planning , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[33]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[34]  Buyu Liu,et al.  Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Mayank Bansal,et al.  ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst , 2018, Robotics: Science and Systems.

[36]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[37]  J. Schroder,et al.  Navigating car-like robots in unstructured environments using an obstacle sensitive cost function , 2008, 2008 IEEE Intelligent Vehicles Symposium.

[38]  Yin Zhou,et al.  VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Wolfram Burgard,et al.  Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) , 2005 .