Modeling humans as observation providers using POMDPs

The ability to obtain accurate observations while navigating in uncertain environments is a difficult challenge in deploying robots. Robots have relied heavily on human supervisors who are always available to provide additional observations to reduce uncertainty. We are instead interested in taking advantage of humans who are already in the environment to receive observations. The challenge is in modeling these humans' availability and higher costs of interruption to determine when to query them during navigation. In this work, we introduce a Human Observation Provider POMDP framework (HOP-POMDP), and contribute new algorithms for planning and executing with HOP-POMDPs that account for the differences between humans and other probabilistic sensors that provide observations. We compare optimal HOP-POMDP policies that plan for needing humans' observations with oracle POMDP policies that do not take human costs and availability into account. We show in benchmark tests and real-world environments that the oracle policies match the optimal HOP-POMDP policy 60% of the time, and can be used in cases when humans are likely to be available on the shortest paths. However, the HOP-POMDP policies receive higher rewards in general as they take into account the possibility that a human may be unavailable. HOP-POMDP policies only need to be computed once prior to the deployment of the robot, so it is feasible to precompute and use in practice.

[1]  Takayuki Kanda,et al.  A semi-autonomous communication robot — A field trial at a train station , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[2]  Laurent Jeanpierre,et al.  Partially Observable Markov Decision Process for Managing Robot Collaboration with Human , 2009, 2009 21st IEEE International Conference on Tools with Artificial Intelligence.

[3]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[4]  Rüdiger Dillmann,et al.  Reasoning for a multi-modal service robot considering uncertainty in human-robot interaction , 2008, 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[5]  Richard M. Anderson,et al.  Complexity results for infinite-horizon markov decision processes , 2000 .

[6]  Satoru Hayamizu,et al.  Socially Embedded Learning of the Office-Conversant Mobil Robot Jijo-2 , 1997, IJCAI.

[7]  Christopher G. Atkeson,et al.  Predicting human interruptibility with sensors , 2005, TCHI.

[8]  Stephanie Rosenthal,et al.  An effective personal mobile robot agent through symbiotic human-robot interaction , 2010, AAMAS.

[9]  Manuela M. Veloso,et al.  Oracular Partially Observable Markov Decision Processes: A Very Special Case , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[10]  VelosoManuela,et al.  Is Someone in this Office Available to Help Me , 2012 .

[11]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[12]  D. Aberdeen,et al.  A ( Revised ) Survey of Approximate Methods for Solving Partially Observable Markov Decision Processes , 2003 .

[13]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[14]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[15]  Debra Schreckenghost,et al.  Adjustable Autonomy for Human-Centered Autonomous Systems , 1999 .

[16]  E. A. Hansen Markov Decision Processes with Observation Costs TITLE2 , 2001 .

[17]  Stephanie Rosenthal,et al.  Is Someone in this Office Available to Help Me? , 2012, J. Intell. Robotic Syst..