I Know What You Meant: Learning Human Objectives by (Under)estimating Their Choice Set

Assistive robots have the potential to help people perform everyday tasks. However, these robots first need to learn what it is their user wants them to do. Teaching assistive robots is hard for inexperienced users, elderly users, and users living with physical disabilities, since often these individuals are unable to teleoperate the robot along their desired behavior. We know that inclusive learners should give human teachers credit for what they cannot demonstrate. But today's robots do the opposite: they assume every user is capable of providing any demonstration. As a result, these robots learn to mimic the demonstrated behavior, even when that behavior isn't what the human really meant! We propose an alternate approach to reward learning: robots that reason about the user's demonstrations in the context of similar or simpler alternatives. Unlike prior works -- which err towards overestimating the human's capabilities -- here we err towards underestimating what the human can input (i.e., their choice set). Our theoretical analysis proves that underestimating the human's choice set is risk-averse, with better worst-case performance than overestimating. We formalize three properties to generate similar and simpler alternatives: across simulations and a user study, our algorithm better enables robots to extrapolate the human's objective. See our user study here: this https URL

[1]  A. Tversky,et al.  Prospect theory: an analysis of decision under risk — Source link , 2007 .

[2]  Noah D. Goodman,et al.  Learning the Preferences of Ignorant, Inconsistent Agents , 2015, AAAI.

[3]  Chris L. Baker,et al.  Action understanding as inverse planning , 2009, Cognition.

[4]  Anca D. Dragan,et al.  Where Do You Think You're Going?: Inferring Beliefs about Dynamics from Behavior , 2018, NeurIPS.

[5]  Pieter Abbeel,et al.  An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.

[6]  Rohin Shah,et al.  Choice Set Misspecification in Reward Inference , 2021, AISafety@IJCAI.

[7]  Stefan Schaal,et al.  Learning objective functions for manipulation , 2013, 2013 IEEE International Conference on Robotics and Automation.

[8]  Shachar Kariv,et al.  Who is (More) Rational? , 2011 .

[9]  Siddhartha S. Srinivasa,et al.  Assistive teleoperation of robot arms via automatic time-optimal mode switching , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[10]  Aude Billard,et al.  Donut as I do: Learning from failed demonstrations , 2011, 2011 IEEE International Conference on Robotics and Automation.

[11]  Anca D. Dragan,et al.  Quantifying Hypothesis Space Misspecification in Learning From Human–Robot Demonstrations and Physical Corrections , 2020, IEEE Transactions on Robotics.

[12]  Brenna D. Argall,et al.  Autonomy in Rehabilitation Robotics: An Intersection , 2018, Annu. Rev. Control. Robotics Auton. Syst..

[13]  Jan Peters,et al.  Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[14]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[15]  Dorsa Sadigh,et al.  Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences , 2020, Int. J. Robotics Res..

[16]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[17]  Anca D. Dragan,et al.  Active Preference-Based Learning of Reward Functions , 2017, Robotics: Science and Systems.

[18]  Songhwai Oh,et al.  Robust Learning From Demonstrations With Mixed Qualities Using Leveraged Gaussian Processes , 2019, IEEE Transactions on Robotics.

[19]  Martial Hebert,et al.  Autonomy infused teleoperation with application to brain computer interface controlled manipulation , 2017, Autonomous Robots.

[20]  Shimon Whiteson,et al.  Inverse Reinforcement Learning from Failure , 2016, AAMAS.

[21]  Anca D. Dragan,et al.  Reward-rational (implicit) choice: A unifying formalism for reward learning , 2020, NeurIPS.

[22]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[23]  Brenna D. Argall,et al.  Customized Handling of Unintended Interface Operation in Assistive Robots , 2020, ArXiv.

[24]  Anita Silvers,et al.  Americans with Disabilities , 2015 .

[25]  Dylan P. Losey,et al.  Trajectory Deformations From Physical Human–Robot Interaction , 2017, IEEE Transactions on Robotics.

[26]  Scott Niekum,et al.  Better-than-Demonstrator Imitation Learning via Automatically-Ranked Demonstrations , 2019, CoRL.

[27]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[28]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[29]  Dorsa Sadigh,et al.  Controlling Assistive Robots with Learned Latent Actions , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[31]  Anca D. Dragan,et al.  Learning from Physical Human Corrections, One Feature at a Time , 2018, 2018 13th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[32]  Siddhartha S. Srinivasa,et al.  Eye-Hand Behavior in Human-Robot Shared Manipulation , 2018, 2018 13th ACM/IEEE International Conference on Human-Robot Interaction (HRI).