Planning to Give Information in Partially Observed Domains with a Learned Weighted Entropy Model

In many robotic applications, an autonomous agent must act within and explore a partially observed environment that is unobserved by its human teammate. We consider such a setting in which the agent can, while acting, transmit declarative information to the human that helps them understand aspects of this unseen environment. Naturally, the human will have preferences about what information they are given. This work adopts an information-theoretic view of the human's preferences: the human scores information based on the induced change in weighted entropy of their belief about the environment state. We formulate this setting as a belief MDP and give an algorithm for solving it approximately. Then, we give an algorithm that allows the agent to learn the human's preferences online. We validate our approach experimentally in simulated discrete and continuous partially observed search-and-recover domains.

[1]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[2]  Stefanie Tellex,et al.  Toward Information Theoretic Human-Robot Dialog , 2012, Robotics: Science and Systems.

[3]  Shlomo Zilberstein,et al.  Value of Communication In Decentralized POMDPs , 2009 .

[4]  Manuela Veloso,et al.  What to Communicate? Execution-Time Decision in Multi-agent POMDPs , 2006, DARS.

[5]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[6]  Leslie Pack Kaelbling,et al.  Belief space planning assuming maximum likelihood observations , 2010, Robotics: Science and Systems.

[7]  Joelle Pineau,et al.  Spoken Dialogue Management Using Probabilistic Reasoning , 2000, ACL.

[8]  Stefanie Tellex,et al.  Clarifying commands with information-theoretic human-robot dialog , 2013, HRI 2013.

[9]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[10]  Séverin Lemaignan,et al.  Artificial cognition for social human-robot interaction: An implementation , 2017, Artif. Intell..

[11]  Ville Kyrki,et al.  Active Robot Learning for Temporal Task Models , 2018, 2018 13th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[12]  J. Gregory Trafton,et al.  ACT-R/E , 2013, HRI 2013.

[13]  Dylan Hadfield-Menell,et al.  Modular task and motion planning in belief space , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Craig Boutilier,et al.  A POMDP formulation of preference elicitation problems , 2002, AAAI/IAAI.

[15]  Rachid Alami,et al.  An implemented theory of mind to improve human-robot shared plans execution , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[16]  David Hsu,et al.  DESPOT: Online POMDP Planning with Regularization , 2013, NIPS.

[17]  Robert Givan,et al.  FF-Replan: A Baseline for Probabilistic Planning , 2007, ICAPS.

[18]  Wolfram Burgard,et al.  Active mobile robot localization by entropy minimization , 1997, Proceedings Second EUROMICRO Workshop on Advanced Mobile Robots.

[19]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[20]  David Hsu,et al.  SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[21]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[22]  D. F. Kerridge,et al.  The Logic of Decision , 1967 .

[23]  Anca D. Dragan,et al.  Active Preference-Based Learning of Reward Functions , 2017, Robotics: Science and Systems.

[24]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[25]  Blai Bonet,et al.  Planning with Incomplete Information as Heuristic Search in Belief Space , 2000, AIPS.

[26]  Bradley Hayes,et al.  Improving Robot Controller Transparency Through Autonomous Policy Explanation , 2017, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.

[27]  Leslie Pack Kaelbling,et al.  Acting under uncertainty: discrete Bayesian models for mobile-robot navigation , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.