INQUIRE: INteractive Querying for User-aware Informative REasoning

: Research on Interactive Robot Learning has yielded several modalities for querying a human for training data, including demonstrations, preferences, and corrections. While prior work in this space has focused on optimizing the robot’s queries within each interaction type, there has been little work on optimizing over the selection of the interaction type itself. We present INQUIRE, the first algorithm to implement and optimize over a generalized representation of information gain across multiple interaction types. Our evaluations show that INQUIRE can dynamically optimize its interaction type (and respective optimal query) based on its current learning status and the robot’s state in the world, resulting in more ro-bust performance across tasks in comparison to state-of-the-art baseline methods. Additionally, INQUIRE allows for customizable cost metrics to bias its selection of interaction types, enabling this algorithm to be tailored to a robot’s particular deployment domain and formulate cost-aware, informative queries.

[1]  Dorsa Sadigh,et al.  APReL: A Library for Active Preference-based Reward Learning Algorithms , 2021, 2022 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[2]  Yuchen Cui,et al.  Understanding the Relationship between Interactions and Outcomes in Human-in-the-Loop Machine Learning , 2021, IJCAI.

[3]  Henny Admoni,et al.  Machine Teaching for Human Inverse Reinforcement Learning , 2021, Frontiers in Robotics and AI.

[4]  Ellen R. Novoseller,et al.  ROIAL: Region of Interest Active Learning for Characterizing Exoskeleton Gait Preference Landscapes , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Dorsa Sadigh,et al.  Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences , 2020, Int. J. Robotics Res..

[6]  D. Kulić,et al.  Active Preference Learning using Maximum Regret , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7]  Hong Jun Jeon,et al.  Reward-rational (implicit) choice: A unifying formalism for reward learning , 2020, NeurIPS.

[8]  Dorsa Sadigh,et al.  Asking Easy Questions: A User-Friendly Approach to Active Reward Learning , 2019, CoRL.

[9]  Javier Ruiz-del-Solar,et al.  An Interactive Framework for Learning Continuous Actions Policies Based on Corrective Feedback , 2018, Journal of Intelligent & Robotic Systems.

[10]  Dorsa Sadigh,et al.  Learning Reward Functions by Integrating Human Demonstrations and Preferences , 2019, Robotics: Science and Systems.

[11]  Andrea Lockerd Thomaz,et al.  Human-guided Trajectory Adaptation for Tool Transfer , 2019, AAMAS.

[12]  Prabhat Nagarajan,et al.  Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations , 2019, ICML.

[13]  Shane Legg,et al.  Reward learning from human preferences and demonstrations in Atari , 2018, NeurIPS.

[14]  Andrea Lockerd Thomaz,et al.  Towards Intelligent Arbitration of Diverse Active Learning Queries , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Anca D. Dragan,et al.  Learning Robot Objectives from Physical Human Interaction , 2017, CoRL.

[16]  Anca D. Dragan,et al.  Active Preference-Based Learning of Reward Functions , 2017, Robotics: Science and Systems.

[17]  Oliver Kroemer,et al.  Active Reward Learning , 2014, Robotics: Science and Systems.

[18]  Toni Giorgino,et al.  Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package , 2009 .

[19]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[20]  Henny Admoni,et al.  Interaction Considerations in Learning from Humans , 2021, IJCAI.

[21]  Johannes Fürnkranz,et al.  A Survey of Preference-Based Reinforcement Learning Methods , 2017, J. Mach. Learn. Res..

[22]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.