Using informative behavior to increase engagement in the tamer framework

In this paper, we address a relatively unexplored aspect of designing agents that learn from human training by investigating how the agent's non-task behavior can elicit human feedback of higher quality and quantity. We use the TAMER framework, which facilitates the training of agents by human-generated reward signals, i.e., judgements of the quality of the agent's actions, as the foundation for our investigation. Then, we propose two new training interfaces to increase active involvement in the training process and thereby improve the agent's task performance. One provides information on the agent's uncertainty, the other on its performance. Our results from a 51-subject user study show that these interfaces can induce the trainers to train longer and give more feedback. The agent's performance, however, increases only in response to the addition of performance-oriented information, not by sharing uncertainty levels. Subsequent analysis of our results suggests that the organizational maxim about human behavior, "you get what you measure" - i.e., sharing metrics with people causes them to focus on maximizing or minimizing those metrics while de-emphasizing other objectives - also applies to the training of agents, providing a powerful guiding principle for human-agent interface design in general.

[1]  D. Gill,et al.  Development of the Sport Orientation Questionnaire , 1988 .

[2]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[3]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[4]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[5]  Bruce Blumberg,et al.  Integrated learning for interactive synthetic characters , 2002, SIGGRAPH.

[6]  Erik D. Demaine,et al.  Tetris is Hard, Even to Approximate , 2003, COCOON.

[7]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[8]  S. Baron-Cohen,et al.  Measuring empathy: reliability and validity of the Empathy Quotient , 2004, Psychological Medicine.

[9]  Gabriella Kókai,et al.  Evolving a Heuristic Function for the Game of Tetris , 2004, LWA.

[10]  Erik D. Demaine,et al.  Tetris is hard, even to approximate , 2002, Int. J. Comput. Geom. Appl..

[11]  Andrea Lockerd Thomaz,et al.  Tutelage and socially guided robot learning , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[12]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Cynthia Breazeal,et al.  Real-Time Interactive Reinforcement Learning for Robots , 2005 .

[15]  Jude W. Shavlik,et al.  Creating Advice-Taking Reinforcement Learners , 1998, Machine Learning.

[16]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.

[17]  András Lörincz,et al.  Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.

[18]  C. Breazeal,et al.  Transparency and Socially Guided Machine Learning , 2006 .

[19]  C. Breazeal,et al.  Robot learning via socially guided exploration , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[20]  Brett Browning,et al.  Learning by demonstration with critique from a human teacher , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[21]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[22]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[23]  Manuela M. Veloso,et al.  Interactive Policy Learning through Confidence-Based Autonomy , 2014, J. Artif. Intell. Res..

[24]  A. Thomaz,et al.  Transparent active learning for robots , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[25]  Eduardo F. Morales,et al.  Dynamic Reward Shaping: Training a Robot by Voice , 2010, IBERAMIA.

[26]  Thomas G. Dietterich,et al.  Reinforcement Learning Via Practice and Critique Advice , 2010, AAAI.

[27]  Peter Stone,et al.  Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.

[28]  Matthew E. Taylor,et al.  Integrating Human Demonstration and Reinforcement Learning : Initial Results in Human-Agent Transfer , 2010 .

[29]  Farbod Fahimi,et al.  Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning , 2011, 2011 IEEE International Conference on Rehabilitation Robotics.

[30]  Sonia Chernova,et al.  Effect of human guidance and state space size on Interactive Reinforcement Learning , 2011, 2011 RO-MAN.

[31]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[32]  Peter Stone,et al.  Reinforcement learning from simultaneous human and MDP reward , 2012, AAMAS.

[33]  Bradley C. Love,et al.  How Humans Teach Agents , 2012, Int. J. Soc. Robotics.

[34]  Peter Stone,et al.  Reinforcement learning from human reward: Discounting in episodic tasks , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.

[35]  W. Bradley Knox,et al.  Learning from human-generated reward , 2012 .

[36]  Eric Bouwers,et al.  Getting What You Measure , 2012, Commun. ACM.