论文信息 - Model-based versus Model-free Deep Reinforcement Learning for Autonomous Racing Cars

Model-based versus Model-free Deep Reinforcement Learning for Autonomous Racing Cars

Despite the rich theoretical foundation of modelbased deep reinforcement learning (RL) agents, their effectiveness in real-world robotics-applications is less studied and understood. In this paper we therefore investigate how such agents generalize to real-world autonomous-vehicle controltasks, where advanced model-free deep RL algorithms fail. In particular, we set up a series of time-lap tasks for an F1TENTH racing robot, equipped with high-dimensional LiDAR sensors, on a set of test tracks with a gradual increase in their complexity. In this continuous-control setting, we show that model-based agents capable of learning in imagination, substantially outperform model-free agents with respect to performance, sample efficiency, successful task completion, and generalization. Moreover, we show that the generalization ability of model-based agents strongly depends on the observationmodel choice. Finally, we provide extensive empirical evidence for the effectiveness of model-based agents provided with long enough memory horizons in sim2real tasks.

[1] Xiaojuan Ma,et al. Adversarial Imitation Learning from Incomplete Demonstrations , 2019, IJCAI.

[2] Renaud Dubé,et al. AMZ Driverless: The full autonomous racing system , 2019, J. Field Robotics.

[3] Sergey Levine,et al. End-to-End Robotic Reinforcement Learning without Reward Engineering , 2019, Robotics: Science and Systems.

[4] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[5] Radu Grosu,et al. Response Characterization for Auditing Cell Dynamics in Long Short-term Memory Networks , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[6] Alexander Liniger,et al. Learning-Based Model Predictive Control for Autonomous Racing , 2019, IEEE Robotics and Automation Letters.

[7] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[8] Takuya Akiba,et al. Optuna: A Next-generation Hyperparameter Optimization Framework , 2019, KDD.

[9] Sergio Gomez Colmenarejo,et al. Acme: A Research Framework for Distributed Reinforcement Learning , 2020, ArXiv.

[10] Jürgen Schmidhuber,et al. Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[11] Volkan Sezer,et al. A novel obstacle avoidance algorithm: "Follow the Gap Method" , 2012, Robotics Auton. Syst..

[12] Sergey Levine,et al. The Ingredients of Real-World Robotic Reinforcement Learning , 2020, ICLR.

[13] David Janz,et al. Learning to Drive in a Day , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[14] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[15] Madhur Behl,et al. f1tenth.dev - An Open-source ROS based F1/10 Autonomous Racing Simulator , 2020, 2020 IEEE 16th International Conference on Automation Science and Engineering (CASE).

[16] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[17] Daniela Rus,et al. Learning Robust Control Policies for End-to-End Autonomous Driving From Data-Driven Simulation , 2020, IEEE Robotics and Automation Letters.

[18] Rahul Mangharam,et al. TUNERCAR: A Superoptimization Toolchain for Autonomous Racing , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[19] Accurate Mapping and Planning for Autonomous Racing , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20] Radu Grosu,et al. Gershgorin Loss Stabilizes the Recurrent Neural Network Compartment of an End-to-end Robot Learning Scheme , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[21] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[22] Davide Scaramuzza,et al. Super-Human Performance in Gran Turismo Sport Using Deep Reinforcement Learning , 2020, IEEE Robotics and Automation Letters.

[23] Marcin Andrychowicz,et al. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[24] Yang Yu,et al. Towards Sample Efficient Reinforcement Learning , 2018, IJCAI.

[25] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[26] Atil Iscen,et al. Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[27] Nolan Wagener,et al. Information theoretic MPC for model-based reinforcement learning , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[28] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[29] Radu Grosu,et al. Neural circuit policies enabling auditable autonomy , 2020, Nature Machine Intelligence.

[30] Katie Byl,et al. An Online Training Method for Augmenting MPC with Deep Reinforcement Learning , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[31] Radu Grosu,et al. A Natural Lottery Ticket Winner: Reinforcement Learning with Ordinary Neural Circuits , 2020, ICML.

[32] Mohammad Norouzi,et al. Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[33] Sergey Levine,et al. Learning to Walk via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[34] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[35] Fawzi Nashashibi,et al. End-to-End Race Driving with Deep Reinforcement Learning , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[36] Manuel Kaspar,et al. Sim2Real Transfer for Reinforcement Learning without Dynamics Randomization , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[37] Mathias Lechner,et al. Learning Long-Term Dependencies in Irregularly-Sampled Time Series , 2020, NeurIPS.

[38] Radu Grosu,et al. Designing Worm-inspired Neural Networks for Interpretable Robotic Control , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[39] S. Karaman,et al. Deep Latent Competition: Learning to Race Using Visual Control Policies in Latent Space , 2021, CoRL.

[40] Martin A. Riedmiller,et al. Learning to Drive a Real Car in 20 Minutes , 2007, 2007 Frontiers in the Convergence of Bioscience and Information Technologies.

[41] Radu Grosu,et al. Liquid Time-constant Networks , 2020, AAAI.

[42] E. Velenis,et al. Minimum Time vs Maximum Exit Velocity Path Optimization During Cornering , 2005, Proceedings of the IEEE International Symposium on Industrial Electronics, 2005. ISIE 2005..

[43] Matthew W. Hoffman,et al. Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.

[44] David J. Cole,et al. Minimum Maneuver Time Calculation Using Convex Optimization , 2013 .

[45] Alessandro Rucco,et al. An Efficient Minimum-Time Trajectory Generation Strategy for Two-Track Car Vehicles , 2015, IEEE Transactions on Control Systems Technology.

[46] Florent Lamiraux,et al. Smooth motion planning for car-like vehicles , 2001, IEEE Trans. Robotics Autom..

[47] Francesco Braghin,et al. Race driver model , 2008 .

[48] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[49] Yoshua Bengio,et al. Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[50] Stefan Schaal,et al. Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[51] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.

[52] John Lygeros,et al. Optimization-Based Hierarchical Motion Planning for Autonomous Racing , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[53] Masashi Sugiyama,et al. Imitation Learning from Imperfect Demonstration , 2019, ICML.

[54] David D. Cox,et al. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.