Label Efficient Visual Abstractions for Autonomous Driving

It is well known that semantic segmentation can be used as an effective intermediate representation for learning driving policies. However, the task of street scene semantic segmentation requires expensive annotations. Furthermore, segmentation algorithms are often trained irrespective of the actual driving task, using auxiliary image-space loss functions which are not guaranteed to maximize driving metrics such as safety or distance traveled per intervention. In this work, we seek to quantify the impact of reducing segmentation annotation costs on learned behavior cloning agents. We analyze several segmentation-based intermediate representations. We use these visual abstractions to systematically study the trade-off between annotation efficiency and driving performance, i.e., the types of classes labeled, the number of image samples used to learn the visual abstraction model, and their granularity (e.g., object masks vs. 2D bounding boxes). Our analysis uncovers several practical insights into how segmentation-based visual abstractions can be exploited in a more label efficient manner. Surprisingly, we find that state-of-the-art driving performance can be achieved with orders of magnitude reduction in annotation cost. Beyond label efficiency, we find several additional training benefits when leveraging visual abstractions, such as a significant reduction in the variance of the learned policy when compared to state-of-the-art end-to-end driving models.

[1]  Silvio Savarese,et al.  Learning to Navigate Using Mid-Level Visual Priors , 2019, CoRL.

[2]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Ruigang Yang,et al.  The ApolloScape Open Dataset for Autonomous Driving and Its Application , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Frédo Durand,et al.  On the Importance of Label Quality for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Amnon Shashua,et al.  On the Sample Complexity of End-to-end Training vs. Semantic Abstraction Training , 2016, ArXiv.

[9]  Eder Santana,et al.  Exploring the Limitations of Behavior Cloning for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[11]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[12]  Bernard Ghanem,et al.  Driving Policy Transfer via Modularity and Abstraction , 2018, CoRL.

[13]  Andreas Geiger,et al.  Conditional Affordance Learning for Driving in Urban Environments , 2018, CoRL.

[14]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Daphne Koller,et al.  Learning Spatial Context: Using Stuff to Find Things , 2008, ECCV.

[17]  Eshed Ohn-Bar,et al.  Learning Situational Driving , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Eshed Ohn-Bar,et al.  Supplementary Material for Exploring Data Aggregation in Policy Learning for Vision-based Urban Autonomous Driving , 2020 .

[19]  Trevor Darrell,et al.  Deep Object-Centric Policies for Autonomous Driving , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[20]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Lars Petersson,et al.  Effective Use of Synthetic Data for Urban Scene Semantic Segmentation , 2018, ECCV.

[23]  Christos Dimitrakakis,et al.  TORCS, The Open Racing Car Simulator , 2005 .

[24]  Vladlen Koltun,et al.  Learning by Cheating , 2019, CoRL.

[25]  Guy Van den Broeck,et al.  LaTeS: Latent Space Distillation for Teacher-Student Driving Policy Learning , 2019, ArXiv.

[26]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[27]  Jana Kosecka,et al.  Visual Representations for Semantic Target Driven Navigation , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[28]  Sergio Casas,et al.  End-To-End Interpretable Neural Motion Planner , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Vladlen Koltun,et al.  Does computer vision matter for action? , 2019, Science Robotics.

[30]  Alexey Dosovitskiy,et al.  End-to-End Driving Via Conditional Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Thomas A. Funkhouser,et al.  Dilated Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Marin Toromanoff,et al.  End-to-End Model-Free Reinforcement Learning for Urban Driving Using Implicit Affordances , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[35]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[36]  Roberto Cipolla,et al.  Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..