论文信息 - Reinforcement learning with multi-fidelity simulators

Reinforcement learning with multi-fidelity simulators

We present a framework for reinforcement learning (RL) in a scenario where multiple simulators are available with decreasing amounts of fidelity to the real-world learning scenario. Our framework is designed to limit the number of samples used in each successively higher-fidelity/cost simulator by allowing the agent to choose to run trajectories at the lowest level that will still provide it with information. The approach transfers state-action Q-values from lower-fidelity models as heuristics for the “Knows What It Knows” family of RL algorithms, which is applicable over a wide range of possible dynamics and reward representations. Theoretical proofs of the framework's sample complexity are given and empirical results are demonstrated on a remote controlled car with multiple simulators. The approach allows RL algorithms to find near-optimal policies for the real world with fewer expensive real-world samples than previous transfer approaches or learning without simulators.

Jonathan P. How | Thomas J. Walsh | Mark Cutler | J. How | M. Cutler

[1] Andrew Y. Ng,et al. Policy search via the signed derivative , 2009, Robotics: Science and Systems.

[2] Mykel J. Kochenderfer,et al. Predicting the behavior of interacting humans by fusing data from multiple sources , 2012, UAI.

[3] Yunhui Liu,et al. Stunt driving via policy search , 2012, 2012 IEEE International Conference on Robotics and Automation.

[4] Emilio Frazzoli,et al. Steady-state cornering equilibria and stabilisation for a vehicle during extreme operating conditions , 2010 .

[5] Manuela M. Veloso,et al. Multi-Fidelity Robotic Behaviors: Acting with Variable State Information , 2000, AAAI/IAAI.

[6] Pieter Abbeel,et al. Using inaccurate models in reinforcement learning , 2006, ICML.

[7] Peter Stone,et al. Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[8] Valder Steffen,et al. Optimization of aircraft structural components by using nature-inspired algorithms and multi-fidelity approximations , 2009, J. Glob. Optim..

[9] Robert Haimes,et al. Multifidelity Optimization for Variable-Complexity Design , 2006 .

[10] Pieter Abbeel,et al. An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[11] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .

[12] Jonathan P. How,et al. Performance and Lyapunov Stability of a Nonlinear Path Following Guidance Method , 2007 .

[13] Thomas J. Walsh,et al. Knows what it knows: a framework for self-aware learning , 2008, ICML '08.

[14] Manuel Graña,et al. Transfer learning with Partially Constrained Models: Application to reinforcement learning of linked multicomponent robot system control , 2013, Robotics Auton. Syst..

[15] C.J. Tomlin,et al. Autonomous Automobile Trajectory Tracking for Off-Road Driving: Controller Design, Experimental Validation and Racing , 2007, 2007 American Control Conference.

[16] Michael L. Littman,et al. A unifying framework for computational reinforcement learning theory , 2009 .

[17] Patrick R. Palmer,et al. Multi-fidelity simulation modelling in optimization of a submarine propulsion system , 2010, 2010 IEEE Vehicle Power and Propulsion Conference.

[18] Yoonsuck Choe,et al. Directed Exploration in Reinforcement Learning with Transferred Knowledge , 2012, EWRL.