Learning Accuracy and Availability of Humans Who Help Mobile Robots

When mobile robots perform tasks in environments with humans, it seems appropriate for the robots to rely on such humans for help instead of dedicated human oracles or supervisors. However, these humans are not always available nor always accurate. In this work, we consider human help to a robot as concretely providing observations about the robot's state to reduce state uncertainty as it executes its policy autonomously. We model the probability of receiving an observation from a human in terms of their availability and accuracy by introducing Human Observation Providers POMDPs (HOP-POMDPs). We contribute an algorithm to learn human availability and accuracy online while the robot is executing its current task policy. We demonstrate that our algorithm is effective in approximating the true availability and accuracy of humans without depending on oracles to learn, thus increasing the tractability of deploying a robot that can occasionally ask for help.

[1]  Christopher G. Atkeson,et al.  Predicting human interruptibility with sensors , 2005, TCHI.

[2]  Manuela M. Veloso,et al.  Oracular Partially Observable Markov Decision Processes: A Very Special Case , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[3]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 1998, Machine Learning.

[4]  Richard M. Anderson,et al.  Complexity results for infinite-horizon markov decision processes , 2000 .

[5]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[6]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[7]  Rüdiger Dillmann,et al.  Reasoning for a multi-modal service robot considering uncertainty in human-robot interaction , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[8]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[9]  Stephanie Rosenthal,et al.  How robots' questions affect the accuracy of the human responses , 2009, RO-MAN 2009 - The 18th IEEE International Symposium on Robot and Human Interactive Communication.

[10]  Joelle Pineau,et al.  Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs , 2008, ICML '08.

[11]  Takayuki Kanda,et al.  A semi-autonomous communication robot — A field trial at a train station , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[12]  Stephanie Rosenthal,et al.  An effective personal mobile robot agent through symbiotic human-robot interaction , 2010, AAMAS.

[13]  Joelle Pineau,et al.  Active Learning in Partially Observable Markov Decision Processes , 2005, ECML.

[14]  Stephanie Rosenthal,et al.  Is Someone in this Office Available to Help Me? , 2012, J. Intell. Robotic Syst..

[15]  Lawrence Carin,et al.  Learning to Explore and Exploit in POMDPs , 2009, NIPS.

[16]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[17]  Laurent Jeanpierre,et al.  Partially Observable Markov Decision Process for Managing Robot Collaboration with Human , 2009, 2009 21st IEEE International Conference on Tools with Artificial Intelligence.