Interactive Reinforcement Learning from Imperfect Teachers

Robots can use information from people to improve learning speed or quality. However, people can have short attention spans and misunderstand tasks. Our work addresses these issues with algorithms for learning from inattentive teachers that take advantage of feedback when people are present, and an algorithm for learning from inaccurate teachers that estimates which state-action pairs receive incorrect feedback. These advances will enhance robots' ability to take advantage of imperfect feedback from human teachers.

[1]  Samantha Krening Newtonian Action Advice: Integrating Human Verbal Instruction with Reinforcement Learning , 2019, AAMAS.

[2]  Karen M. Feigh,et al.  Interaction Algorithm Effect on Human Experience with Reinforcement Learning , 2018, ACM Transactions on Human-Robot Interaction.

[3]  Andrea Lockerd Thomaz,et al.  Active Attention-Modified Policy Shaping: Socially Interactive Agents Track , 2019, AAMAS.

[4]  Mohan Sridharan Augmented Reinforcement Learning for Interaction with Non-expert Humans in Agent Domains , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[5]  Michèle Sebag,et al.  APRIL: Active Preference-learning based Reinforcement Learning , 2012, ECML/PKDD.

[6]  Manuela M. Veloso,et al.  Interactive Policy Learning through Confidence-Based Autonomy , 2014, J. Artif. Intell. Res..

[7]  Peter Stone,et al.  Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces , 2017, AAAI.

[8]  Paul E. Utgoff,et al.  On integrating apprentice learning and reinforcement learning , 1996 .

[9]  Joelle Pineau,et al.  Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs , 2008, ICML '08.

[10]  Andrea Lockerd Thomaz,et al.  Exploration from Demonstration for Interactive Reinforcement Learning , 2016, AAMAS.

[11]  Mark O. Riedl,et al.  Explore, Exploit or Listen: Combining Human Feedback and Policy Model to Speed up Deep Reinforcement Learning in 3D Worlds , 2017, ArXiv.

[12]  Gerald DeJong,et al.  Active reinforcement learning , 2008, ICML '08.

[13]  Lynn Huestegge,et al.  A Walk Down the Lane Gives Wings to Your Brain. Restorative Benefits of Rest Breaks on Cognition and Self‐Control , 2016 .

[14]  Andrea Lockerd Thomaz,et al.  Policy Shaping with Supervisory Attention Driven Exploration , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Bo He,et al.  Human-Centered Reinforcement Learning: A Survey , 2019, IEEE Transactions on Human-Machine Systems.

[16]  Tony Belpaeme,et al.  SPARC: an efficient way to combine reinforcement learning and supervised autonomy , 2016 .

[17]  Jens Rasmussen,et al.  Human errors. a taxonomy for describing human malfunction in industrial installations , 1982 .

[18]  Silvio Savarese,et al.  AC-Teach: A Bayesian Actor-Critic Method for Policy Learning with an Ensemble of Suboptimal Teachers , 2019, CoRL.

[19]  Owain Evans,et al.  Trial without Error: Towards Safe Reinforcement Learning via Human Intervention , 2017, AAMAS.

[20]  Andrea L. Thomaz,et al.  Interactive Reinforcement Learning with Inaccurate Feedback , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[21]  P. Tucker The impact of rest breaks upon accident risk, fatigue and performance: A review , 2003 .

[22]  P. Stone,et al.  TAMER: Training an Agent Manually via Evaluative Reinforcement , 2008, 2008 7th IEEE International Conference on Development and Learning.

[23]  Peter Stone,et al.  Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.

[24]  Andrea Lockerd Thomaz,et al.  Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[25]  Maya Cakmak,et al.  Designing Interactions for Robot Active Learners , 2010, IEEE Transactions on Autonomous Mental Development.

[26]  Andrea Lockerd Thomaz,et al.  Policy Shaping with Human Teachers , 2015, IJCAI.