Imitation learning for agile autonomous driving

We present an end-to-end imitation learning system for agile, off-road autonomous driving using only low-cost on-board sensors. By imitating a model predictive controller equipped with advanced sensors, we train a deep neural network control policy to map raw, high-dimensional observations to continuous steering and throttle commands. Compared with recent approaches to similar tasks, our method requires neither state estimation nor on-the-fly planning to navigate the vehicle. Our approach relies on, and experimentally validates, recent imitation learning theory. Empirically, we show that policies trained with online imitation learning overcome well-known challenges related to covariate shift and generalize better than policies trained with batch imitation learning. Built on these insights, our autonomous driving system demonstrates successful high-speed off-road driving, matching the state-of-the-art performance.

[1]  William D. Smart,et al.  Receding Horizon Differential Dynamic Programming , 2007, NIPS.

[2]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[3]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[4]  Jiebo Luo,et al.  End-to-end Multi-Modal Multi-Task Vehicle Control for Self-Driving Cars with Visual Perceptions , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[5]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[6]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[7]  Byron Boots,et al.  Agile Autonomous Driving using End-to-End Deep Imitation Learning , 2017, Robotics: Science and Systems.

[8]  Byron Boots,et al.  Orthogonally Decoupled Variational Gaussian Processes , 2018, NeurIPS.

[9]  Carl E. Rasmussen,et al.  Sparse Spectrum Gaussian Process Regression , 2010, J. Mach. Learn. Res..

[10]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.

[11]  Emilio Frazzoli,et al.  A Survey of Motion Planning and Control Techniques for Self-Driving Urban Vehicles , 2016, IEEE Transactions on Intelligent Vehicles.

[12]  Francesco Borrelli,et al.  Kinematic and dynamic vehicle models for autonomous driving control design , 2015, 2015 IEEE Intelligent Vehicles Symposium (IV).

[13]  Andrew Howard,et al.  Design and use paradigms for Gazebo, an open-source multi-robot simulator , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[14]  J. Andrew Bagnell,et al.  Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.

[15]  Ashutosh Saxena,et al.  High speed obstacle avoidance using monocular vision and reinforcement learning , 2005, ICML.

[16]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[17]  Sergey Levine,et al.  PLATO: Policy learning using adaptive trajectory optimization , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Brandon Schoettle,et al.  Sensor Fusion: A Comparison of Sensing Capabilities of Human Drivers and Highly Automated Vehicles , 2017 .

[19]  Byron Boots,et al.  Accelerating Imitation Learning with Predictive Models , 2018, AISTATS.

[20]  Jonathan Lee,et al.  A Dynamic Regret Analysis and Adaptive Regularization Algorithm for On-Policy Robot Imitation Learning , 2018, WAFR.

[21]  Biao Huang,et al.  System Identification , 2000, Control Theory for Physicists.

[22]  Nolan Wagener,et al.  Fast Policy Learning through Imitation and Reinforcement , 2018, UAI.

[23]  Chang Liu,et al.  Learning a deep neural net policy for end-to-end control of autonomous vehicles , 2017, 2017 American Control Conference (ACC).

[24]  Nolan Wagener,et al.  An Online Learning Approach to Model Predictive Control , 2019, Robotics: Science and Systems.

[25]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[26]  Nolan Wagener,et al.  Information theoretic MPC for model-based reinforcement learning , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Kyunghyun Cho,et al.  Query-Efficient Imitation Learning for End-to-End Autonomous Driving , 2016, ArXiv.

[28]  James M. Rehg,et al.  Aggressive driving with model predictive path integral control , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Giorgio Metta,et al.  Real-time model learning using Incremental Sparse Spectrum Gaussian Process Regression. , 2013, Neural networks : the official journal of the International Neural Network Society.

[31]  James M. Rehg,et al.  Aggressive Deep Driving: Model Predictive Control with a CNN Cost Model , 2017, ArXiv.

[32]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[33]  Francesco Borrelli,et al.  MPC-Based Approach to Active Steering for Autonomous Vehicle Systems , 2005 .

[34]  Yann LeCun,et al.  Off-Road Obstacle Avoidance through End-to-End Learning , 2005, NIPS.

[35]  John Langford,et al.  Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[36]  David J. Cole,et al.  Modelling nonlinear vehicle dynamics with neural networks , 2010 .

[37]  Frank Dellaert,et al.  iSAM2: Incremental smoothing and mapping using the Bayes tree , 2012, Int. J. Robotics Res..

[38]  Byron Boots,et al.  Convergence of Value Aggregation for Imitation Learning , 2018, AISTATS.

[39]  Lawrence D. Jackel,et al.  Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car , 2017, ArXiv.

[40]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[41]  Martial Hebert,et al.  Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[42]  Andreas Hansen,et al.  Data Collection for Robust End-to-End Lateral Vehicle Control , 2017 .

[43]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[44]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[45]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[46]  Rajesh Rajamani,et al.  Vehicle dynamics and control , 2005 .

[47]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[48]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[49]  J. Christian Gerdes,et al.  Path-tracking for autonomous vehicles at the limit of friction , 2017, 2017 American Control Conference (ACC).

[50]  Byron Boots,et al.  Prediction under Uncertainty in Sparse Spectrum Gaussian Processes with Applications to Filtering and Control , 2017, ICML.

[51]  Byron Boots,et al.  Predictor-Corrector Policy Optimization , 2018, ICML.

[52]  Anca D. Dragan,et al.  Comparing human-centric and robot-centric sampling for robot deep learning from demonstrations , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[53]  Byron Boots,et al.  Variational Inference for Gaussian Process Models with Linear Complexity , 2017, NIPS.

[54]  Byron Boots,et al.  Online Learning with Continuous Variations: Dynamic Regret and Reductions , 2020, AISTATS.

[55]  Emanuel Todorov,et al.  Combining the benefits of function approximation and trajectory optimization , 2014, Robotics: Science and Systems.

[56]  Shu Yang,et al.  Baidu driving dataset and end-to-end reactive control model , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).