Widening the Pipeline in Human-Guided Reinforcement Learning with Explanation and Context-Aware Data Augmentation

Human explanation (e.g., in terms of feature importance) has been recently used to extend the communication channel between human and agent in interactive machine learning. Under this setting, human trainers provide not only the ground truth but also some form of explanation. However, this kind of human guidance was only investigated in supervised learning tasks, and it remains unclear how to best incorporate this type of human knowledge into deep reinforcement learning. In this paper, we present the first study of using human visual explanations in humanin-the-loop reinforcement learning (HRL). We focus on the task of learning from feedback, in which the human trainer not only gives binary evaluative “good" or “bad" feedback for queried state-action pairs, but also provides a visual explanation by annotating relevant features in images. We propose EXPAND (EXPlanation AugmeNted feeDback) to encourage the model to encode task-relevant features through a context-aware data augmentation that only perturbs irrelevant features in human salient information. We choose five tasks, namely Pixel-Taxi and four Atari games, to evaluate the performance and sample efficiency of this approach. We show that our method significantly outperforms methods leveraging human explanation that are adapted from supervised learning, and Human-in-the-loop RL baselines that only utilize evaluative feedback.

[1]  Mudit Verma,et al.  Symbols as a Lingua Franca for Bridging Human-AI Chasm for Explainable and Advisable AI Systems , 2021, AAAI.

[2]  Pieter Abbeel,et al.  PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training , 2021, ICML.

[3]  Charles Blundell,et al.  Representation Learning via Invariant Causal Mechanisms , 2020, ICLR.

[4]  Anca D. Dragan,et al.  Feature Expansive Reward Learning: Rethinking Human Input , 2020, 2021 16th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[5]  R. Fergus,et al.  Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels , 2020, ICLR.

[6]  Yuchen Cui,et al.  The EMPATHIC Framework for Task Learning from Implicit Human Feedback , 2020, CoRL.

[7]  Abhinav Gupta,et al.  Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases , 2020, NeurIPS.

[8]  Bo Liu,et al.  Human Gaze Assisted Artificial Intelligence: A Review , 2020, IJCAI.

[9]  Ilya Kostrikov,et al.  Automatic Data Augmentation for Generalization in Deep Reinforcement Learning , 2020, ArXiv.

[10]  Yasuo Kuniyoshi,et al.  Using Human Gaze to Improve Robustness Against Irrelevant Objects in Robot Manipulation Tasks , 2020, IEEE Robotics and Automation Letters.

[11]  P. Abbeel,et al.  Reinforcement Learning with Augmented Data , 2020, NeurIPS.

[12]  Scott Niekum,et al.  Efficiently Guiding Imitation Learning Algorithms with Human Gaze , 2020, ArXiv.

[13]  Radha Poovendran,et al.  FRESH: Interactive Reward Shaping in High-Dimensional State Spaces using Human Feedback , 2020, AAMAS.

[14]  Kristian Kersting,et al.  Making deep neural networks right for the right scientific reasons by interacting with their explanations , 2020, Nat. Mach. Intell..

[15]  Chandan Singh,et al.  Interpretations are useful: penalizing explanations to align neural networks with prior knowledge , 2019, ICML.

[16]  Luxin Zhang,et al.  Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset , 2019, ArXiv.

[17]  Vladimir Aliev,et al.  Free-Lunch Saliency via Attention in Atari Agents , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[18]  Peter Stone,et al.  Leveraging Human Guidance for Deep Reinforcement Learning Tasks , 2019, IJCAI.

[19]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[20]  Michael L. Littman,et al.  Deep Reinforcement Learning from Policy-Dependent Human Feedback , 2019, ArXiv.

[21]  Kristian Kersting,et al.  Explanatory Interactive Machine Learning , 2019, AIES.

[22]  Marc G. Bellemare,et al.  An Atari Model Zoo for Analyzing, Visualizing, and Comparing Deep Reinforcement Learning Agents , 2018, IJCAI.

[23]  Yuta Tsuboi,et al.  DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback , 2018, ArXiv.

[24]  Luxin Zhang,et al.  AGIL: Learning Attention from Human for Visuomotor Tasks , 2018, ECCV.

[25]  Jonathan Dodge,et al.  Visualizing and Understanding Atari Agents , 2017, ICML.

[26]  Peter Stone,et al.  Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces , 2017, AAAI.

[27]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[28]  Marc Peter Deisenroth,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[29]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[30]  Andrew Slavin Ross,et al.  Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations , 2017, IJCAI.

[31]  Karen M. Feigh,et al.  Learning From Explanations Using Sentiment and Advice in RL , 2017, IEEE Transactions on Cognitive and Developmental Systems.

[32]  Guan Wang,et al.  Interactive Learning from Policy-Dependent Human Feedback , 2017, ICML.

[33]  Johannes Fürnkranz,et al.  A Survey of Preference-Based Reinforcement Learning Methods , 2017, J. Mach. Learn. Res..

[34]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[35]  Andrea Lockerd Thomaz,et al.  Policy Shaping with Human Teachers , 2015, IJCAI.

[36]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[37]  David L. Roberts,et al.  Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning , 2015, Autonomous Agents and Multi-Agent Systems.

[38]  David L. Roberts,et al.  Learning something from nothing: Leveraging implicit human feedback strategies , 2014, The 23rd IEEE International Symposium on Robot and Human Interactive Communication.

[39]  Matthieu Geist,et al.  Boosted Bellman Residual Minimization Handling Expert Demonstrations , 2014, ECML/PKDD.

[40]  Andrea Lockerd Thomaz,et al.  Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[41]  Peter Stone,et al.  Reinforcement learning from simultaneous human and MDP reward , 2012, AAMAS.

[42]  Peter Stone,et al.  Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.

[43]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[44]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.

[45]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[46]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[47]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[48]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[49]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[50]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[51]  Stefan Schaal,et al.  Learning from Demonstration , 1996, NIPS.