Social interaction for efficient agent learning from human reward

Learning from rewards generated by a human trainer observing an agent in action has been proven to be a powerful method for teaching autonomous agents to perform challenging tasks, especially for those non-technical users. Since the efficacy of this approach depends critically on the reward the trainer provides, we consider how the interaction between the trainer and the agent should be designed so as to increase the efficiency of the training process. This article investigates the influence of the agent’s socio-competitive feedback on the human trainer’s training behavior and the agent’s learning. The results of our user study with 85 participants suggest that the agent’s passive socio-competitive feedback—showing performance and score of agents trained by trainers in a leaderboard—substantially increases the engagement of the participants in the game task and improves the agents’ performance, even though the participants do not directly play the game but instead train the agent to do so. Moreover, making this feedback active—sending the trainer her agent’s performance relative to others—further induces more participants to train agents longer and improves the agent’s learning. Our further analysis shows that agents trained by trainers affected by both the passive and active social feedback could obtain a higher performance under a score mechanism that could be optimized from the trainer’s perspective and the agent’s additional active social feedback can keep participants to further train agents to learn policies that can obtain a higher performance under such a score mechanism.

[1]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[2]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[3]  Stuart J. Russell,et al.  Bayesian Q-Learning , 1998, AAAI/IAAI.

[4]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[5]  Peter Stone,et al.  A social reinforcement learning agent , 2001, AGENTS '01.

[6]  Bruce Blumberg,et al.  Integrated learning for interactive synthetic characters , 2002, SIGGRAPH.

[7]  Pierre-Yves Oudeyer,et al.  Robotic clicker training , 2002, Robotics Auton. Syst..

[8]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[9]  Gabriella Kókai,et al.  Evolving a Heuristic Function for the Game of Tetris , 2004, LWA.

[10]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12]  Jude W. Shavlik,et al.  Giving Advice about Preferred Actions to Reinforcement Learners Via Knowledge-Based Kernel Regression , 2005, AAAI.

[13]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.

[14]  András Lörincz,et al.  Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.

[15]  Peter Stone,et al.  Cobot in LambdaMOO: An Adaptive Social Statistics Agent , 2006, Autonomous Agents and Multi-Agent Systems.

[16]  Brett Browning,et al.  Learning by demonstration with critique from a human teacher , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[17]  Rosta Farzan,et al.  Results from deploying a participation incentive mechanism within the enterprise , 2008, CHI.

[18]  Andrea Lockerd Thomaz,et al.  Teachable robots: Understanding human teaching behavior to build more effective robot learners , 2008, Artif. Intell..

[19]  Chen-Nee Chuah,et al.  Unveiling facebook: a measurement study of social network based applications , 2008, IMC '08.

[20]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[21]  Arno Scharl,et al.  Games with a purpose for social networking platforms , 2009, HT '09.

[22]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[23]  Eduardo F. Morales,et al.  Dynamic Reward Shaping: Training a Robot by Voice , 2010, IBERAMIA.

[24]  I. Gabe Zichermann Ii. Joselin Linder,et al.  Game-Based Marketing: Inspire Customer Loyalty Through Rewards, Challenges, and Contests , 2010 .

[25]  Peter Stone,et al.  Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.

[26]  Andrea Gaggioli,et al.  Improving social game engagement on facebook through enhanced socio-contextual information , 2010, CHI.

[27]  Farbod Fahimi,et al.  Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning , 2011, 2011 IEEE International Conference on Rehabilitation Robotics.

[28]  Sonia Chernova,et al.  Effect of human guidance and state space size on Interactive Reinforcement Learning , 2011, 2011 RO-MAN.

[29]  Peter Stone,et al.  Reinforcement learning from simultaneous human and MDP reward , 2012, AAMAS.

[30]  Peter Stone,et al.  Reinforcement learning from human reward: Discounting in episodic tasks , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.

[31]  W. Bradley Knox,et al.  Learning from human-generated reward , 2012 .

[32]  David R. Millen,et al.  Removing gamification from an enterprise SNS , 2012, CSCW.

[33]  John W. Rice,et al.  The Gamification of Learning and Instruction: Game-Based Methods and Strategies for Training and Education , 2012, Int. J. Gaming Comput. Mediat. Simulations.

[34]  Lindsay T. Graham,et al.  A Review of Facebook Research in the Social Sciences , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[35]  Tao Dong,et al.  Discovery-based games for learning software , 2012, CHI.

[36]  Ari Korhonen,et al.  Empirical Study on the Effect of Achievement Badges in TRAKLA2 Online Learning Environment , 2013, 2013 Learning and Teaching in Computing and Engineering.

[37]  Luis de Marcos,et al.  Gamifying learning experiences: Practical implications and outcomes , 2013, Comput. Educ..

[38]  M. Fardo KAPP, Karl M. The gamification of learning and instruction: game-based methods and strategies for training and education. San Francisco: Pfeiffer, 2012. , 2013 .

[39]  Andrea Lockerd Thomaz,et al.  Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[40]  Shimon Whiteson,et al.  Using informative behavior to increase engagement in the tamer framework , 2013, AAMAS.

[41]  Juho Hamari,et al.  Does Gamification Work? -- A Literature Review of Empirical Studies on Gamification , 2014, 2014 47th Hawaii International Conference on System Sciences.

[42]  Shimon Whiteson,et al.  Leveraging social networks to motivate humans to train agents , 2014, AAMAS.

[43]  Shimon Whiteson,et al.  Learning from human reward benefits from socio-competitive feedback , 2014, 4th International Conference on Development and Learning and on Epigenetic Robotics.

[44]  Shimon Whiteson,et al.  Using informative behavior to increase engagement while learning from human reward , 2015, Autonomous Agents and Multi-Agent Systems.

[45]  David L. Roberts,et al.  Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning , 2015, Autonomous Agents and Multi-Agent Systems.

[46]  Peter Stone,et al.  Framing reinforcement learning from human reward: Reward positivity, temporal discounting, episodicity, and performance , 2015, Artif. Intell..

[47]  Andrea Lockerd Thomaz,et al.  Policy Shaping with Human Teachers , 2015, IJCAI.

[48]  Lennart E. Nacke,et al.  Gamification : Toward a Definition , 2022 .