Active Attention-Modified Policy Shaping: Socially Interactive Agents Track

We present the Active Attention-Modified Policy Shaping (Active AMPS) algorithm, which allows learning robots to request feedback from multi-tasking human teachers. Active AMPS uses Reinforcement Learning supplemented with feedback from teachers, while avoiding frequently interrupting the teacher. This algorithm does so by selectively asking for attention from teachers in low-information areas of the state space when there is uncertainty about the teacher's feedback. Active AMPS allows people to take breaks from teaching the robot to complete other tasks, and is forgiving to lapses in human attention if learning occurs over long periods of time. We test Active AMPS both in simulation and on a physical robot in a human study. In simulation, we find that Active AMPS outperforms Attention-Modified Policy Shaping (AMPS), achieving an 11.0% increase in area under its learning curve while receiving 89.9% less feedback. In the human study, we find statistically significant results showing that Active AMPS allows people to complete 77.5% more work than AMPS while the robot receives 48.5% less feedback, without decreasing performance.

[1]  Joseph S. Valacich,et al.  The Influence of Task Interruption on Individual Decision Making: An Information Overload Perspective , 1999 .

[2]  Christopher A. Monk,et al.  Dealing with Interruptions can be Complex, but does Interruption Complexity Matter: A Mental Resources Approach to Quantifying Disruptions , 2008 .

[3]  Laurel D. Riek,et al.  Social context perception for mobile robots , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Brian P. Bailey,et al.  If not now, when?: the effects of interruption at different moments within task execution , 2004, CHI.

[5]  Michèle Sebag,et al.  APRIL: Active Preference-learning based Reinforcement Learning , 2012, ECML/PKDD.

[6]  Andrea Lockerd Thomaz,et al.  Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[7]  Gerald DeJong,et al.  Active reinforcement learning , 2008, ICML '08.

[8]  Yuchen Cui,et al.  Active Learning from Critiques via Bayesian Inverse Reinforcement Learning , 2017 .

[9]  Brian Scassellati,et al.  Give Me a Break! Personalized Timing Strategies to Promote Learning in Robot-Child Tutoring , 2017, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.

[10]  Daniel Kudenko,et al.  Parallel Reinforcement Learning with Linear Function Approximation , 2007, Adaptive Agents and Multi-Agents Systems.

[11]  Anca D. Dragan,et al.  Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[12]  Yuchen Cui,et al.  Risk-Aware Active Inverse Reinforcement Learning , 2018, CoRL.

[13]  Andrea Lockerd Thomaz,et al.  Policy Shaping with Supervisory Attention Driven Exploration , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Joelle Pineau,et al.  Reinforcement Learning with Limited Reinforcement : Using Bayes Risk for Active Learning in POMDPs Finale , 2012 .

[15]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[16]  Manuela M. Veloso,et al.  Interactive Policy Learning through Confidence-Based Autonomy , 2014, J. Artif. Intell. Res..

[17]  Manuel Lopes,et al.  Active Learning for Reward Estimation in Inverse Reinforcement Learning , 2009, ECML/PKDD.

[18]  Owain Evans,et al.  Trial without Error: Towards Safe Reinforcement Learning via Human Intervention , 2017, AAMAS.

[19]  Peter Stone,et al.  Reinforcement learning from simultaneous human and MDP reward , 2012, AAMAS.

[20]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[21]  Andrea Lockerd Thomaz,et al.  Policy Shaping with Human Teachers , 2015, IJCAI.

[22]  Tony Belpaeme,et al.  Supervised autonomy for online learning in human-robot interaction , 2017, Pattern Recognit. Lett..

[23]  Maya Cakmak,et al.  Designing Interactions for Robot Active Learners , 2010, IEEE Transactions on Autonomous Mental Development.

[24]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[25]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[26]  Paul E. Utgoff,et al.  On integrating apprentice learning and reinforcement learning , 1996 .