论文信息 - Reinforcement learning with value advice

Reinforcement learning with value advice

The problem we consider in this paper is reinforcement learning with value advice. In this setting, the agent is given limited access to an oracle that can tell it the expected return (value) of any state-action pair with respect to the optimal policy. The agent must use this value to learn an explicit policy that performs well in the environment. We provide an algorithm called RLAdvice, based on the imitation learning algorithm DAgger. We illustrate the eectiveness of this method in the Arcade Learning Environment on three dierent games, using value estimates from UCT as advice.

Marcus Hutter | Peter Sunehag | Mayank Daswani

[1] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[2] Jude W. Shavlik,et al. Creating Advice-Taking Reinforcement Learners , 1998, Machine Learning.

[3] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[4] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[5] Garrison W. Cottrell,et al. Principled Methods for Advising Reinforcement Learning Agents , 2003, ICML.

[6] Marc G. Bellemare,et al. Bayesian Learning of Recursively Factored Environments , 2013, ICML.

[7] Marcus Hutter,et al. Q-learning for history-based reinforcement learning , 2013, ACML.

[8] Joel Veness,et al. A Monte-Carlo AIXI Approximation , 2009, J. Artif. Intell. Res..

[9] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[10] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[11] Ioannis P. Vlahavas,et al. Reinforcement learning agents providing advice in complex video games , 2014, Connect. Sci..