Policy Shaping with Supervisory Attention Driven Exploration

Robots deployed for long periods of time need to be able to explore and learn from their environment. One approach to this problem has been reinforcement learning (RL), in which robots receive rewards from the environment that allow them to choose optimal actions. To speed learning when human supervision is available, interactive reinforcement learning solicits feedback from a human teacher. However, this approach typically assumes that learning takes place under continuous supervision, which is unlikely to hold in long-term scenarios. We propose an extension to a method of interactive reinforcement learning, policy shaping, that takes into account human attention. Our approach enables better performance while unattended by favoring information-gathering actions when attended and actions that have received positive feedback when unattended. We test our approach in both simulation and on a robot, finding that our method learns faster than policy shaping and performs more safely than policy shaping while no one is paying attention to the robot.

[1]  Andrea Lockerd Thomaz,et al.  Policy Shaping with Human Teachers , 2015, IJCAI.

[2]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[3]  Andrea Lockerd Thomaz,et al.  Teachable robots: Understanding human teaching behavior to build more effective robot learners , 2008, Artif. Intell..

[4]  S. Shankar Sastry,et al.  Surprise-Based Intrinsic Motivation for Deep Reinforcement Learning , 2017, ArXiv.

[5]  Qianli Xu,et al.  Designing engagement-aware agents for multiparty conversations , 2013, CHI.

[6]  Illah R. Nourbakhsh,et al.  The role of expressiveness and attention in human-robot interaction , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[7]  Pierre-Yves Oudeyer,et al.  Active choice of teachers, learning strategies and goals for a socially guided intrinsic motivation learner , 2012, Paladyn J. Behav. Robotics.

[8]  S. Feinman,et al.  Social Referencing in Infancy. , 1982 .

[9]  Matthew W. Crocker,et al.  Visual attention in spoken human-robot interaction , 2009, 2009 4th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[10]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[11]  Manuel Giuliani,et al.  Automatically Classifying User Engagement for Dynamic Multi-party Human–Robot Interaction , 2017, International Journal of Social Robotics.

[12]  Marek P. Michalowski,et al.  A spatial model of engagement for a social robot , 2006, 9th IEEE International Workshop on Advanced Motion Control, 2006..

[13]  Nilanjan Sarkar,et al.  Operator Engagement Detection and Robot Behavior Adaptation in Human-Robot Interaction , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[14]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[15]  Candace L. Sidner,et al.  Explorations in engagement for humans and robots , 2005, Artif. Intell..

[16]  Andrea Lockerd Thomaz,et al.  Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[17]  Brian Scassellati,et al.  Active Learning of Joint Attention , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[18]  Candace L. Sidner,et al.  Recognizing engagement in human-robot interaction , 2010, HRI 2010.

[19]  Tatsuya Kawahara,et al.  Detection of social signals for recognizing engagement in human-robot interaction , 2017, ArXiv.

[20]  P. Stone,et al.  TAMER: Training an Agent Manually via Evaluative Reinforcement , 2008, 2008 7th IEEE International Conference on Development and Learning.

[21]  Ana Paiva,et al.  Automatic analysis of affective postures and body motion to detect engagement with a game companion , 2011, 2011 6th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[22]  Bilge Mutlu,et al.  Robot behavior toolkit: Generating effective social behaviors for robots , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).