Model-based versus Model-free Deep Reinforcement Learning for Autonomous Racing Cars

Despite the rich theoretical foundation of modelbased deep reinforcement learning (RL) agents, their effectiveness in real-world robotics-applications is less studied and understood. In this paper we therefore investigate how such agents generalize to real-world autonomous-vehicle controltasks, where advanced model-free deep RL algorithms fail. In particular, we set up a series of time-lap tasks for an F1TENTH racing robot, equipped with high-dimensional LiDAR sensors, on a set of test tracks with a gradual increase in their complexity. In this continuous-control setting, we show that model-based agents capable of learning in imagination, substantially outperform model-free agents with respect to performance, sample efficiency, successful task completion, and generalization. Moreover, we show that the generalization ability of model-based agents strongly depends on the observationmodel choice. Finally, we provide extensive empirical evidence for the effectiveness of model-based agents provided with long enough memory horizons in sim2real tasks.

[1]  Xiaojuan Ma,et al.  Adversarial Imitation Learning from Incomplete Demonstrations , 2019, IJCAI.

[2]  Renaud Dubé,et al.  AMZ Driverless: The full autonomous racing system , 2019, J. Field Robotics.

[3]  Sergey Levine,et al.  End-to-End Robotic Reinforcement Learning without Reward Engineering , 2019, Robotics: Science and Systems.

[4]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[5]  Radu Grosu,et al.  Response Characterization for Auditing Cell Dynamics in Long Short-term Memory Networks , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[6]  Alexander Liniger,et al.  Learning-Based Model Predictive Control for Autonomous Racing , 2019, IEEE Robotics and Automation Letters.

[7]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[8]  Takuya Akiba,et al.  Optuna: A Next-generation Hyperparameter Optimization Framework , 2019, KDD.

[9]  Sergio Gomez Colmenarejo,et al.  Acme: A Research Framework for Distributed Reinforcement Learning , 2020, ArXiv.

[10]  Jürgen Schmidhuber,et al.  Recurrent World Models Facilitate Policy Evolution , 2018, NeurIPS.

[11]  Volkan Sezer,et al.  A novel obstacle avoidance algorithm: "Follow the Gap Method" , 2012, Robotics Auton. Syst..

[12]  Sergey Levine,et al.  The Ingredients of Real-World Robotic Reinforcement Learning , 2020, ICLR.

[13]  David Janz,et al.  Learning to Drive in a Day , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[14]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[15]  Madhur Behl,et al.  f1tenth.dev - An Open-source ROS based F1/10 Autonomous Racing Simulator , 2020, 2020 IEEE 16th International Conference on Automation Science and Engineering (CASE).

[16]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[17]  Daniela Rus,et al.  Learning Robust Control Policies for End-to-End Autonomous Driving From Data-Driven Simulation , 2020, IEEE Robotics and Automation Letters.

[18]  Rahul Mangharam,et al.  TUNERCAR: A Superoptimization Toolchain for Autonomous Racing , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Accurate Mapping and Planning for Autonomous Racing , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Radu Grosu,et al.  Gershgorin Loss Stabilizes the Recurrent Neural Network Compartment of an End-to-end Robot Learning Scheme , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[22]  Davide Scaramuzza,et al.  Super-Human Performance in Gran Turismo Sport Using Deep Reinforcement Learning , 2020, IEEE Robotics and Automation Letters.

[23]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Yang Yu,et al.  Towards Sample Efficient Reinforcement Learning , 2018, IJCAI.

[25]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[26]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[27]  Nolan Wagener,et al.  Information theoretic MPC for model-based reinforcement learning , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[29]  Radu Grosu,et al.  Neural circuit policies enabling auditable autonomy , 2020, Nature Machine Intelligence.

[30]  Katie Byl,et al.  An Online Training Method for Augmenting MPC with Deep Reinforcement Learning , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[31]  Radu Grosu,et al.  A Natural Lottery Ticket Winner: Reinforcement Learning with Ordinary Neural Circuits , 2020, ICML.

[32]  Mohammad Norouzi,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[33]  Sergey Levine,et al.  Learning to Walk via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[34]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[35]  Fawzi Nashashibi,et al.  End-to-End Race Driving with Deep Reinforcement Learning , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Manuel Kaspar,et al.  Sim2Real Transfer for Reinforcement Learning without Dynamics Randomization , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[37]  Mathias Lechner,et al.  Learning Long-Term Dependencies in Irregularly-Sampled Time Series , 2020, NeurIPS.

[38]  Radu Grosu,et al.  Designing Worm-inspired Neural Networks for Interpretable Robotic Control , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[39]  S. Karaman,et al.  Deep Latent Competition: Learning to Race Using Visual Control Policies in Latent Space , 2021, CoRL.

[40]  Martin A. Riedmiller,et al.  Learning to Drive a Real Car in 20 Minutes , 2007, 2007 Frontiers in the Convergence of Bioscience and Information Technologies.

[41]  Radu Grosu,et al.  Liquid Time-constant Networks , 2020, AAAI.

[42]  E. Velenis,et al.  Minimum Time vs Maximum Exit Velocity Path Optimization During Cornering , 2005, Proceedings of the IEEE International Symposium on Industrial Electronics, 2005. ISIE 2005..

[43]  Matthew W. Hoffman,et al.  Distributed Distributional Deterministic Policy Gradients , 2018, ICLR.

[44]  David J. Cole,et al.  Minimum Maneuver Time Calculation Using Convex Optimization , 2013 .

[45]  Alessandro Rucco,et al.  An Efficient Minimum-Time Trajectory Generation Strategy for Two-Track Car Vehicles , 2015, IEEE Transactions on Control Systems Technology.

[46]  Florent Lamiraux,et al.  Smooth motion planning for car-like vehicles , 2001, IEEE Trans. Robotics Autom..

[47]  Francesco Braghin,et al.  Race driver model , 2008 .

[48]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[49]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[50]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[51]  Yuval Tassa,et al.  Maximum a Posteriori Policy Optimisation , 2018, ICLR.

[52]  John Lygeros,et al.  Optimization-Based Hierarchical Motion Planning for Autonomous Racing , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[53]  Masashi Sugiyama,et al.  Imitation Learning from Imperfect Demonstration , 2019, ICML.

[54]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.