Learning Situational Driving

Human drivers have a remarkable ability to drive in diverse visual conditions and situations, e.g., from maneuvering in rainy, limited visibility conditions with no lane markings to turning in a busy intersection while yielding to pedestrians. In contrast, we find that state-of-the-art sensorimotor driving models struggle when encountering diverse settings with varying relationships between observation and action. To generalize when making decisions across diverse conditions, humans leverage multiple types of situation-specific reasoning and learning strategies. Motivated by this observation, we develop a framework for learning a situational driving policy that effectively captures reasoning under varying types of scenarios. Our key idea is to learn a mixture model with a set of policies that can capture multiple driving modes. We first optimize the mixture model through behavior cloning and show it to result in significant gains in terms of driving performance in diverse conditions. We then refine the model by directly optimizing for the driving task itself, i.e., supervised with the navigation task reward. Our method is more scalable than methods assuming access to privileged information, e.g., perception labels, as it only assumes demonstration and reward-based supervision. We achieve over 98% success rate on the CARLA driving benchmark as well as state-of-the-art performance on a newly introduced generalization benchmark.

[1]  Rahul Sukthankar,et al.  Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[2]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[3]  Yann LeCun,et al.  Off-Road Obstacle Avoidance through End-to-End Learning , 2005, NIPS.

[4]  Leonidas J. Guibas,et al.  Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Active Tasks , 2018, ArXiv.

[5]  Leonidas J. Guibas,et al.  Taskonomy: Disentangling Task Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Doina Precup,et al.  Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Alexey Dosovitskiy,et al.  End-to-End Driving Via Conditional Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Vladlen Koltun,et al.  On Offline Evaluation of Vision-based Driving Models , 2018, ECCV.

[10]  Trevor Darrell,et al.  Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Neil Smith,et al.  OIL: Observational Imitation Learning , 2018, Robotics: Science and Systems.

[12]  Vladlen Koltun,et al.  Does computer vision matter for action? , 2019, Science Robotics.

[13]  Anthony M. Zador,et al.  A critique of pure learning and what artificial neural networks can learn from animal brains , 2019, Nature Communications.

[14]  Vittorio Ferrari,et al.  Situational object boundary detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Mohit Sharma,et al.  Directed-Info GAIL: Learning Hierarchical Policies from Unsegmented Demonstrations using Directed Information , 2018, ICLR.

[16]  Eric P. Xing,et al.  CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving , 2018, ECCV.

[17]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[18]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[19]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Leonidas J. Guibas,et al.  Situational Fusion of Visual Representation for Visual Navigation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  David Q. Mayne,et al.  Robust model predictive control of constrained linear systems with bounded disturbances , 2005, Autom..

[22]  Yang Gao,et al.  End-to-End Learning of Driving Models from Large-Scale Video Datasets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[24]  Vladlen Koltun,et al.  Learning by Cheating , 2019, CoRL.

[25]  Charles C. MacAdam,et al.  Understanding and Modeling the Human Driver , 2003 .

[26]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[27]  Leslie Pack Kaelbling,et al.  Residual Policy Learning , 2018, ArXiv.

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[30]  Sergey Levine,et al.  MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies , 2019, NeurIPS.

[31]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[32]  Bernard Ghanem,et al.  Driving Policy Transfer via Modularity and Abstraction , 2018, CoRL.

[33]  Eder Santana,et al.  Exploring the Limitations of Behavior Cloning for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Andreas Geiger,et al.  Conditional Affordance Learning for Driving in Urban Environments , 2018, CoRL.

[35]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[36]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[37]  Claude Sammut,et al.  A Framework for Behavioural Cloning , 1995, Machine Intelligence 15.

[38]  Mayank Bansal,et al.  ChauffeurNet: Learning to Drive by Imitating the Best and Synthesizing the Worst , 2018, Robotics: Science and Systems.

[39]  He He,et al.  Imitation Learning by Coaching , 2012, NIPS.

[40]  Trevor Darrell,et al.  Monocular Plan View Networks for Autonomous Driving , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[41]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[42]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[43]  Jürgen Schmidhuber,et al.  Evolving large-scale neural networks for vision-based reinforcement learning , 2013, GECCO '13.

[44]  Pieter Abbeel,et al.  An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.

[45]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[46]  Mica R. Endsley,et al.  Theoretical Underpinnings of Situation Awareness, A Critical Review , 2000 .

[47]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[48]  S. Srihari Mixture Density Networks , 1994 .

[49]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[50]  Jürgen Schmidhuber,et al.  Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[51]  Pushmeet Kohli,et al.  CompILE: Compositional Imitation Learning and Execution , 2018, ICML.

[52]  Darwin T. Kuan,et al.  Autonomous Robotic Vehicle Road Following , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[53]  Amnon Shashua,et al.  On the Sample Complexity of End-to-end Training vs. Semantic Abstraction Training , 2016, ArXiv.