Supplementary Material for Exploring Data Aggregation in Policy Learning for Vision-based Urban Autonomous Driving

Data aggregation techniques can significantly improve vision-based policy learning within a training environment, e.g., learning to drive in a specific simulation condition. However, as on-policy data is sequentially sampled and added in an iterative manner, the policy can specialize and overfit to the training conditions. For real-world applications, it is useful for the learned policy to generalize to novel scenarios that differ from the training conditions. To improve policy learning while maintaining robustness when training end-to-end driving policies, we perform an extensive analysis of data aggregation techniques in the CARLA environment. We demonstrate how the majority of them have poor generalization performance, and develop a novel approach with empirically better generalization performance compared to existing techniques. Our two key ideas are (1) to sample critical states from the collected on-policy data based on the utility they provide to the learned policy in terms of driving behavior, and (2) to incorporate a replay buffer which progressively focuses on the high uncertainty regions of the policy’s state distribution. We evaluate the proposed approach on the CARLA NoCrash benchmark, focusing on the most challenging driving scenarios with dense pedestrian and vehicle traffic. Our approach improves driving success rate by 16% over state-of-the-art, achieving 87% of the expert performance while also reducing the collision rate by an order of magnitude without the use of any additional modality, auxiliary tasks, architectural modifications or reward from the environment.

[1]  Ashish Kapoor,et al.  Visual recognition and detection under bounded computational resources , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Bernard Ghanem,et al.  Driving Policy Transfer via Modularity and Abstraction , 2018, CoRL.

[4]  Andreas Geiger,et al.  Conditional Affordance Learning for Driving in Urban Environments , 2018, CoRL.

[5]  Yang Gao,et al.  End-to-End Learning of Driving Models from Large-Scale Video Datasets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[7]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[8]  Sergey Levine,et al.  InfoBot: Transfer and Exploration via the Information Bottleneck , 2019, ICLR.

[9]  Robert Babuska,et al.  Experience Selection in Deep Reinforcement Learning for Control , 2018, J. Mach. Learn. Res..

[10]  Karl Tuyls,et al.  The importance of experience replay database composition in deep reinforcement learning , 2015 .

[11]  Katja Hofmann,et al.  Asynchronous Data Aggregation for Training End to End Visual Control Networks , 2017, AAMAS.

[12]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[13]  Ioannis Mitliagkas,et al.  A Modern Take on the Bias-Variance Tradeoff in Neural Networks , 2018, ArXiv.

[14]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[15]  Michele Fenzi,et al.  Scalable Active Learning for Object Detection , 2020, 2020 IEEE Intelligent Vehicles Symposium (IV).

[16]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[17]  Sergio Casas,et al.  End-To-End Interpretable Neural Motion Planner , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Yarin Gal,et al.  BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning , 2019, NeurIPS.

[19]  Vladlen Koltun,et al.  Learning by Cheating , 2019, CoRL.

[20]  Guy Van den Broeck,et al.  LaTeS: Latent Space Distillation for Teacher-Student Driving Policy Learning , 2019, ArXiv.

[21]  Byron Boots,et al.  Agile Autonomous Driving using End-to-End Deep Imitation Learning , 2017, Robotics: Science and Systems.

[22]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[23]  Ross A. Knepper,et al.  Following High-level Navigation Instructions on a Simulated Quadcopter with Imitation Learning , 2018, Robotics: Science and Systems.

[24]  Vladlen Koltun,et al.  Deep Drone Racing: From Simulation to Reality With Domain Randomization , 2019, IEEE Transactions on Robotics.

[25]  Subhransu Maji,et al.  Active Adversarial Domain Adaptation , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[26]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Ming-Yu Liu,et al.  Localization-Aware Active Learning for Object Detection , 2018, ACCV.

[28]  Byron Boots,et al.  Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction , 2017, ICML.

[29]  Joachim Denzler,et al.  Active learning and discovery of object categories in the presence of unnameable instances , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Joachim M. Buhmann,et al.  Weakly supervised structured output learning for semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Qing Wang,et al.  End-to-end driving simulation via angle branched network , 2018, ArXiv.

[33]  Armando Solar-Lezama,et al.  Verifiable Reinforcement Learning via Policy Extraction , 2018, NeurIPS.

[34]  David Janz,et al.  Learning to Drive in a Day , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[35]  Emilie Wirbel,et al.  Conditional Vehicle Trajectories Prediction in CARLA Urban Environment , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[36]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[37]  Changchun Liu,et al.  Baidu Apollo EM Motion Planner , 2018, ArXiv.

[38]  Andreas Krause,et al.  Online Variance Reduction with Mixtures , 2019, ICML.

[39]  Luke Fletcher,et al.  A perception‐driven autonomous urban vehicle , 2008, J. Field Robotics.

[40]  Kyunghyun Cho,et al.  Query-Efficient Imitation Learning for End-to-End Simulated Driving , 2017, AAAI.

[41]  Jens Kober,et al.  Off-policy experience retention for deep actor-critic learning , 2016, NIPS 2016.

[42]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Wojciech Jaskowski,et al.  Model-Based Active Exploration , 2018, ICML.

[44]  Siddhartha S. Srinivasa,et al.  Imitation learning for locomotion and manipulation , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[45]  Manmohan Krishna Chandraker,et al.  Learning To Simulate , 2018, ICLR.

[46]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[47]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[48]  Alexey Dosovitskiy,et al.  End-to-End Driving Via Conditional Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[49]  E. D. Dickmanns,et al.  The development of machine vision for road vehicles in the last decade , 2002, Intelligent Vehicle Symposium, 2002. IEEE.

[50]  Shigeki Sugano,et al.  Rethinking Self-driving: Multi-task Knowledge for Better Generalization and Accident Explanation Ability , 2018, ArXiv.

[51]  Ralf G. Herrtwich,et al.  Making Bertha See , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[52]  Martial Hebert,et al.  Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[53]  Zhijie Liu,et al.  Dense 3D Semantic SLAM of traffic environment based on stereo vision , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[54]  Eder Santana,et al.  Exploring the Limitations of Behavior Cloning for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[55]  J. Andrew Bagnell,et al.  Efficient Reductions for Imitation Learning , 2010, AISTATS.

[56]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[57]  J. Andrew Bagnell,et al.  Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.

[58]  Buyu Liu,et al.  Active Learning for Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[59]  Thomas G. Dietterich,et al.  Active lmitation learning: formal and practical reductions to I.I.D. learning , 2014, J. Mach. Learn. Res..

[60]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Joachim Denzler,et al.  Selecting Influential Examples: Active Learning with Expected Model Output Changes , 2014, ECCV.

[62]  Twan Koolen,et al.  LVIS: Learning from Value Function Intervals for Contact-Aware Robot Controllers , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[63]  Sanja Fidler,et al.  Meta-Sim: Learning to Generate Synthetic Datasets , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[64]  Xian-Sheng Hua,et al.  Two-Dimensional Active Learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  Yi Xiao,et al.  Multimodal End-to-End Autonomous Driving , 2019, IEEE Transactions on Intelligent Transportation Systems.

[66]  Anca D. Dragan,et al.  DART: Noise Injection for Robust Imitation Learning , 2017, CoRL.

[67]  Anca D. Dragan,et al.  Establishing Appropriate Trust via Critical States , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[68]  Eric P. Xing,et al.  CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving , 2018, ECCV.

[69]  Robert Babuska,et al.  Improved deep reinforcement learning for robotics through distribution-based experience retention , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[70]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Sergey Levine,et al.  Learning Hand-Eye Coordination for Robotic Grasping with Large-Scale Data Collection , 2016, ISER.

[72]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[73]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[74]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Mohit Sharma,et al.  Unsupervised Discovery of Decision States for Transfer in Reinforcement Learning , 2019, ArXiv.

[76]  Kristen Grauman,et al.  Active Image Segmentation Propagation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).