Training a Tetris agent via interactive shaping: a demonstration of the TAMER framework

As computational learning agents continue to improve their ability to learn sequential decision-making tasks, a central but largely unfulfilled goal is to deploy these agents in real-world domains in which they interact with humans and make decisions that affect our lives. People will want such interactive agents to be able to perform tasks for which the agent's original developers could not prepare it. Thus it will be imperative to develop agents that can learn from natural methods of communication. The teaching technique of shaping is one such method. In this context, we define shaping as training an agent through signals of positive and negative reinforcement. In a shaping scenario, a human trainer observes an agent and reinforces its behavior through push-buttons, spoken word ("yes" or "no"), facial expression, or any other signal that can be converted to a scalar signal of approval or disapproval. We treat shaping as a specific mode of knowledge transfer, distinct from (and probably complementary to) other natural methods of communication, including programming by demonstration and advice-giving. The key challenge before us is to create agents that can be shaped effectively.

[1]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[2]  Peter Stone,et al.  Interactive shaping of a tetris agent using the TAMER framework , 2009, IJCAI 2009.

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  P. Stone,et al.  TAMER: Training an Agent Manually via Evaluative Reinforcement , 2008, 2008 7th IEEE International Conference on Development and Learning.

[5]  M. Bouton Learning and Behavior: A Contemporary Synthesis , 2006 .

[6]  Peter Stone,et al.  Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.

[7]  András Lörincz,et al.  Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.

[8]  Gabriella Kókai,et al.  Evolving a Heuristic Function for the Game of Tetris , 2004, LWA.

[9]  Ian R. Fasel,et al.  Design Principles for Creating Human-Shapable Agents , 2009, AAAI Spring Symposium: Agents that Learn from Human Teachers.

[10]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[11]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[12]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[13]  Jan Ramon,et al.  On the numeric stability of Gaussian processes regression for relational reinforcement learning , 2004, ICML 2004.