Autonomous drifting using simulation-aided reinforcement learning

We introduce a framework that combines simple and complex continuous state-action simulators with a real-world robot to efficiently find good control policies, while minimizing the number of samples needed from the physical robot. The framework combines the strengths of various simulation levels by first finding optimal policies in a simple model, and then using that solution to initialize a gradient-based learner in a more complex simulation. The policy and transition dynamics from the complex simulation are in turn used to guide the learning in the physical world. A method is developed for transferring information gathered in the physical world back to the learning agent in the simulation. The new information is used to re-evaluate whether the original simulated policy is still optimal given the updated knowledge from the real-world. This reverse transfer is critical to minimizing samples from the physical world. The new framework is demonstrated on a robotic car learning to perform controlled drifting maneuvers. A video of the car's performance can be found at https: //youtu.be/opsmd5yuBF0.

[1]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[2]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[3]  Lyle H. Ungar,et al.  A non-parametric Monte Carlo technique for controller verification , 1997, Autom..

[4]  Pieter Abbeel,et al.  Using inaccurate models in reinforcement learning , 2006, ICML.

[5]  C.J. Tomlin,et al.  Autonomous Automobile Trajectory Tracking for Off-Road Driving: Controller Design, Experimental Validation and Racing , 2007, 2007 American Control Conference.

[6]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[7]  Pieter Abbeel,et al.  Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[8]  Emilio Frazzoli,et al.  Steady-state drifting stabilization of RWD vehicles , 2011 .

[9]  Oliver Kroemer,et al.  Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[10]  Gabe Sibley,et al.  A holistic framework for planning , real-time control and model learning for high-speed ground vehicle navigation over rough 3 D terrain , 2012 .

[11]  K. Vadirajacharya,et al.  Performance Verification of PID Controller in an Interconnected Power System Using Particle Swarm Optimization , 2012 .

[12]  Yunhui Liu,et al.  Stunt driving via policy search , 2012, 2012 IEEE International Conference on Robotics and Automation.

[13]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[14]  Anil V. Rao,et al.  GPOPS-II , 2014, ACM Trans. Math. Softw..

[15]  Jonathan P. How,et al.  Reinforcement learning with multi-fidelity simulators , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Jonathan P. How,et al.  Real-World Reinforcement Learning via Multifidelity Simulators , 2015, IEEE Transactions on Robotics.

[17]  Carl E. Rasmussen,et al.  Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Jonathan P. How,et al.  Efficient reinforcement learning for robots using informative simulated priors , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).