论文信息 - Imitation learning for agile autonomous driving

Imitation learning for agile autonomous driving

We present an end-to-end imitation learning system for agile, off-road autonomous driving using only low-cost on-board sensors. By imitating a model predictive controller equipped with advanced sensors, we train a deep neural network control policy to map raw, high-dimensional observations to continuous steering and throttle commands. Compared with recent approaches to similar tasks, our method requires neither state estimation nor on-the-fly planning to navigate the vehicle. Our approach relies on, and experimentally validates, recent imitation learning theory. Empirically, we show that policies trained with online imitation learning overcome well-known challenges related to covariate shift and generalize better than policies trained with batch imitation learning. Built on these insights, our autonomous driving system demonstrates successful high-speed off-road driving, matching the state-of-the-art performance.

[1] William D. Smart,et al. Receding Horizon Differential Dynamic Programming , 2007, NIPS.

[2] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[3] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.

[4] Jiebo Luo,et al. End-to-end Multi-Modal Multi-Task Vehicle Control for Self-Driving Cars with Visual Perceptions , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[5] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[6] Neil D. Lawrence,et al. Gaussian Processes for Big Data , 2013, UAI.

[7] Byron Boots,et al. Agile Autonomous Driving using End-to-End Deep Imitation Learning , 2017, Robotics: Science and Systems.

[8] Byron Boots,et al. Orthogonally Decoupled Variational Gaussian Processes , 2018, NeurIPS.

[9] Carl E. Rasmussen,et al. Sparse Spectrum Gaussian Process Regression , 2010, J. Mach. Learn. Res..

[10] David Q. Mayne,et al. Differential dynamic programming , 1972, The Mathematical Gazette.

[11] Emilio Frazzoli,et al. A Survey of Motion Planning and Control Techniques for Self-Driving Urban Vehicles , 2016, IEEE Transactions on Intelligent Vehicles.

[12] Francesco Borrelli,et al. Kinematic and dynamic vehicle models for autonomous driving control design , 2015, 2015 IEEE Intelligent Vehicles Symposium (IV).

[13] Andrew Howard,et al. Design and use paradigms for Gazebo, an open-source multi-robot simulator , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[14] J. Andrew Bagnell,et al. Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.

[15] Ashutosh Saxena,et al. High speed obstacle avoidance using monocular vision and reinforcement learning , 2005, ICML.

[16] H. Shimodaira,et al. Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[17] Sergey Levine,et al. PLATO: Policy learning using adaptive trajectory optimization , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[18] Brandon Schoettle,et al. Sensor Fusion: A Comparison of Sensing Capabilities of Human Drivers and Highly Automated Vehicles , 2017 .

[19] Byron Boots,et al. Accelerating Imitation Learning with Predictive Models , 2018, AISTATS.

[20] Jonathan Lee,et al. A Dynamic Regret Analysis and Adaptive Regularization Algorithm for On-Policy Robot Imitation Learning , 2018, WAFR.

[21] Biao Huang,et al. System Identification , 2000, Control Theory for Physicists.

[22] Nolan Wagener,et al. Fast Policy Learning through Imitation and Reinforcement , 2018, UAI.

[23] Chang Liu,et al. Learning a deep neural net policy for end-to-end control of autonomous vehicles , 2017, 2017 American Control Conference (ACC).

[24] Nolan Wagener,et al. An Online Learning Approach to Model Predictive Control , 2019, Robotics: Science and Systems.

[25] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .

[26] Nolan Wagener,et al. Information theoretic MPC for model-based reinforcement learning , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[27] Kyunghyun Cho,et al. Query-Efficient Imitation Learning for End-to-End Autonomous Driving , 2016, ArXiv.

[28] James M. Rehg,et al. Aggressive driving with model predictive path integral control , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[29] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30] Giorgio Metta,et al. Real-time model learning using Incremental Sparse Spectrum Gaussian Process Regression. , 2013, Neural networks : the official journal of the International Neural Network Society.

[31] James M. Rehg,et al. Aggressive Deep Driving: Model Predictive Control with a CNN Cost Model , 2017, ArXiv.

[32] Alison L Gibbs,et al. On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[33] Francesco Borrelli,et al. MPC-Based Approach to Active Steering for Autonomous Vehicle Systems , 2005 .

[34] Yann LeCun,et al. Off-Road Obstacle Avoidance through End-to-End Learning , 2005, NIPS.

[35] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[36] David J. Cole,et al. Modelling nonlinear vehicle dynamics with neural networks , 2010 .

[37] Frank Dellaert,et al. iSAM2: Incremental smoothing and mapping using the Bayes tree , 2012, Int. J. Robotics Res..

[38] Byron Boots,et al. Convergence of Value Aggregation for Imitation Learning , 2018, AISTATS.

[39] Lawrence D. Jackel,et al. Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car , 2017, ArXiv.

[40] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[41] Martial Hebert,et al. Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[42] Andreas Hansen,et al. Data Collection for Robust End-to-End Lateral Vehicle Control , 2017 .

[43] Zoubin Ghahramani,et al. Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[44] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[45] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[46] Rajesh Rajamani,et al. Vehicle dynamics and control , 2005 .

[47] Michalis K. Titsias,et al. Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[48] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[49] J. Christian Gerdes,et al. Path-tracking for autonomous vehicles at the limit of friction , 2017, 2017 American Control Conference (ACC).

[50] Byron Boots,et al. Prediction under Uncertainty in Sparse Spectrum Gaussian Processes with Applications to Filtering and Control , 2017, ICML.

[51] Byron Boots,et al. Predictor-Corrector Policy Optimization , 2018, ICML.

[52] Anca D. Dragan,et al. Comparing human-centric and robot-centric sampling for robot deep learning from demonstrations , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[53] Byron Boots,et al. Variational Inference for Gaussian Process Models with Linear Complexity , 2017, NIPS.

[54] Byron Boots,et al. Online Learning with Continuous Variations: Dynamic Regret and Reductions , 2020, AISTATS.

[55] Emanuel Todorov,et al. Combining the benefits of function approximation and trajectory optimization , 2014, Robotics: Science and Systems.

[56] Shu Yang,et al. Baidu driving dataset and end-to-end reactive control model , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).