Bayesian Active Learning for Collaborative Task Specification Using Equivalence Regions

Specifying complex task behaviors while ensuring good robot performance may be difficult for untrained users. We study a framework for users to specify rules for acceptable behavior in a shared environment such as industrial facilities. As non-expert users might have little intuition about how their specification impacts the robot's performance, we design a learning system that interacts with users to find an optimal solution. Using active preference learning, we iteratively show alternative paths that the robot could take on an interface. From the user feedback on ranking the alternatives, we learn about the weights that users place on each part of their specification. We extend the user model from our previous work to a discrete Bayesian learning model and introduce a greedy algorithm for proposing alternative that operates on the notion of equivalence regions of user weights. We prove that using this algorithm, the revision active learning process converges on the user-optimal path. Using simulations performed in realistic industrial environments, we demonstrate the convergence and robustness of our approach.

[1]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[2]  Julie A. Shah,et al.  Fast Scheduling of Robot Teams Performing Tasks With Temporospatial Constraints , 2018, IEEE Transactions on Robotics.

[3]  Phokion G. Kolaitis,et al.  Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems , 2002 .

[4]  Mukesh Singhal,et al.  Learning from Richer Human Guidance: Augmenting Comparison-Based Learning with Feature Queries , 2018, HRI.

[5]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[6]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[7]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[8]  Steven M. LaValle,et al.  Planning algorithms , 2006 .

[9]  Mukesh K. Mohania,et al.  Decision trees for entity identification: approximation algorithms and hardness results , 2007, PODS '07.

[10]  Andreas Krause,et al.  Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization , 2010, J. Artif. Intell. Res..

[11]  Leslie G. Valiant,et al.  The Complexity of Enumeration and Reliability Problems , 1979, SIAM J. Comput..

[12]  Francesco Leali,et al.  Survey on human–robot collaboration in industrial settings: Safety, intuitive interfaces and applications , 2018, Mechatronics.

[13]  Anca D. Dragan,et al.  Active Preference-Based Learning of Reward Functions , 2017, Robotics: Science and Systems.

[14]  Jens Vygen,et al.  The Book Review Column1 , 2020, SIGACT News.

[15]  Paul N. Bennett,et al.  Active Comparison Based Learning Incorporating User Uncertainty and Noise , 2016 .

[16]  Dana Kulic,et al.  Learning User Preferences in Robot Motion Planning Through Interaction , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Dana Kulic,et al.  Assessing User Specifications for Robot Task Planning , 2018, 2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[18]  Robert D. Nowak,et al.  Active Ranking using Pairwise Comparisons , 2011, NIPS.

[19]  Oliver Kroemer,et al.  Active Reward Learning , 2014, Robotics: Science and Systems.

[20]  Alan Fern,et al.  A Bayesian Approach for Policy Learning from Trajectory Preference Queries , 2012, NIPS.

[21]  Andreas Krause,et al.  Near-Optimal Bayesian Active Learning with Noisy Observations , 2010, NIPS.

[22]  Maya Cakmak,et al.  Trajectories and keyframes for kinesthetic teaching: A human-robot interaction perspective , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).