Reinforcement learning with multi-fidelity simulators

We present a framework for reinforcement learning (RL) in a scenario where multiple simulators are available with decreasing amounts of fidelity to the real-world learning scenario. Our framework is designed to limit the number of samples used in each successively higher-fidelity/cost simulator by allowing the agent to choose to run trajectories at the lowest level that will still provide it with information. The approach transfers state-action Q-values from lower-fidelity models as heuristics for the “Knows What It Knows” family of RL algorithms, which is applicable over a wide range of possible dynamics and reward representations. Theoretical proofs of the framework's sample complexity are given and empirical results are demonstrated on a remote controlled car with multiple simulators. The approach allows RL algorithms to find near-optimal policies for the real world with fewer expensive real-world samples than previous transfer approaches or learning without simulators.

[1]  Andrew Y. Ng,et al.  Policy search via the signed derivative , 2009, Robotics: Science and Systems.

[2]  Mykel J. Kochenderfer,et al.  Predicting the behavior of interacting humans by fusing data from multiple sources , 2012, UAI.

[3]  Yunhui Liu,et al.  Stunt driving via policy search , 2012, 2012 IEEE International Conference on Robotics and Automation.

[4]  Emilio Frazzoli,et al.  Steady-state cornering equilibria and stabilisation for a vehicle during extreme operating conditions , 2010 .

[5]  Manuela M. Veloso,et al.  Multi-Fidelity Robotic Behaviors: Acting with Variable State Information , 2000, AAAI/IAAI.

[6]  Pieter Abbeel,et al.  Using inaccurate models in reinforcement learning , 2006, ICML.

[7]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[8]  Valder Steffen,et al.  Optimization of aircraft structural components by using nature-inspired algorithms and multi-fidelity approximations , 2009, J. Glob. Optim..

[9]  Robert Haimes,et al.  Multifidelity Optimization for Variable-Complexity Design , 2006 .

[10]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[11]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[12]  Jonathan P. How,et al.  Performance and Lyapunov Stability of a Nonlinear Path Following Guidance Method , 2007 .

[13]  Thomas J. Walsh,et al.  Knows what it knows: a framework for self-aware learning , 2008, ICML '08.

[14]  Manuel Graña,et al.  Transfer learning with Partially Constrained Models: Application to reinforcement learning of linked multicomponent robot system control , 2013, Robotics Auton. Syst..

[15]  C.J. Tomlin,et al.  Autonomous Automobile Trajectory Tracking for Off-Road Driving: Controller Design, Experimental Validation and Racing , 2007, 2007 American Control Conference.

[16]  Michael L. Littman,et al.  A unifying framework for computational reinforcement learning theory , 2009 .

[17]  Patrick R. Palmer,et al.  Multi-fidelity simulation modelling in optimization of a submarine propulsion system , 2010, 2010 IEEE Vehicle Power and Propulsion Conference.

[18]  Yoonsuck Choe,et al.  Directed Exploration in Reinforcement Learning with Transferred Knowledge , 2012, EWRL.