论文信息 - Let's Do the Time Warp Again: Human Action Assistance for Reinforcement Learning Agents

Let's Do the Time Warp Again: Human Action Assistance for Reinforcement Learning Agents

Reinforcement learning (RL) agents may take a long time to learn a policy for a complex task. One way to help the agent to convergence on a policy faster is by offering it some form of assistance from a teacher who already has some expertise on the same task. The teacher can be either a human or another computer agent, and they can provide assistance by controlling the reward, action selection, or state definition that the agent views. However, some forms of assistance might come more naturally from a human teacher than a computer teacher and vice versa. For instance, a challenge for human teachers in providing action selection is that because computers and human operate at different speed increments, it is difficult to translate what constitutes an action selection for a particular state in a human’s perception to that of the computer agent. In this paper, we introduce a system called Time Warp that allows a human teacher to provide action selection assistance to the agent during critical moments of the training for the RL agent. We find that Time Warp is able to help the agent develop a better policy in less time than an RL agent with no assistance and rivals the performance of computer teaching agents. Time Warp also is able to reach the results with only ten minutes of human training

Rebecca Hwa | Frederick L. Crabbe | Carter B. Burn

[1] Andrea Lockerd Thomaz,et al. Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[2] Peter Stone,et al. Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[3] Ofra Amir,et al. Interactive Teaching Strategies for Agent Training , 2016, IJCAI.

[4] Andrea Lockerd Thomaz,et al. Exploration from Demonstration for Interactive Reinforcement Learning , 2016, AAMAS.

[5] Jude W. Shavlik,et al. Creating Advice-Taking Reinforcement Learners , 1998, Machine Learning.

[6] Andrea Lockerd Thomaz,et al. Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.

[7] Peter Stone,et al. Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.

[8] Panos M. Pardalos,et al. Reinforcement Learning in Video Games Using Nearest Neighbor Interpolation and Metric Learning , 2016, IEEE Transactions on Computational Intelligence and AI in Games.

[9] Matthew E. Taylor,et al. Teaching on a budget: agents advising agents in reinforcement learning , 2013, AAMAS.

[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11] Peter Stone,et al. Source Task Creation for Curriculum Learning , 2016, AAMAS.

[12] Peter Stone,et al. Reinforcement learning from simultaneous human and MDP reward , 2012, AAMAS.