Learning What Information to Give in Partially Observed Domains

In many robotic applications, an autonomous agent must act within and explore a partially observed environment that is unobserved by its human teammate. We consider such a setting in which the agent can, while acting, transmit declarative information to the human that helps them understand aspects of this unseen environment. In this work, we address the algorithmic question of how the agent should plan out what actions to take and what information to transmit. Naturally, one would expect the human to have preferences, which we model information-theoretically by scoring transmitted information based on the change it induces in weighted entropy of the human's belief state. We formulate this setting as a belief MDP and give a tractable algorithm for solving it approximately. Then, we give an algorithm that allows the agent to learn the human's preferences online, through exploration. We validate our approach experimentally in simulated discrete and continuous partially observed search-and-recover domains. Visit this http URL for a supplementary video.

[1]  Rachid Alami,et al.  An implemented theory of mind to improve human-robot shared plans execution , 2016, 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[2]  Wolfram Burgard,et al.  Active mobile robot localization by entropy minimization , 1997, Proceedings Second EUROMICRO Workshop on Advanced Mobile Robots.

[3]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[4]  David Hsu,et al.  SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[5]  Stefanie Tellex,et al.  Toward Information Theoretic Human-Robot Dialog , 2012, Robotics: Science and Systems.

[6]  David Hsu,et al.  DESPOT: Online POMDP Planning with Regularization , 2013, NIPS.

[7]  Craig Boutilier,et al.  A POMDP formulation of preference elicitation problems , 2002, AAAI/IAAI.

[8]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[9]  C. Aitken,et al.  The logic of decision , 2014 .

[10]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[11]  J. Gregory Trafton,et al.  ACT-R/E , 2013, HRI 2013.

[12]  Stefanie Tellex,et al.  Clarifying commands with information-theoretic human-robot dialog , 2013, HRI 2013.

[13]  Leslie Pack Kaelbling,et al.  Acting under uncertainty: discrete Bayesian models for mobile-robot navigation , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[14]  Robert Givan,et al.  FF-Replan: A Baseline for Probabilistic Planning , 2007, ICAPS.

[15]  Leslie Pack Kaelbling,et al.  Belief space planning assuming maximum likelihood observations , 2010, Robotics: Science and Systems.

[16]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[17]  Anca D. Dragan,et al.  Active Preference-Based Learning of Reward Functions , 2017, Robotics: Science and Systems.

[18]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[19]  Joelle Pineau,et al.  Spoken Dialogue Management Using Probabilistic Reasoning , 2000, ACL.

[20]  Blai Bonet,et al.  Planning with Incomplete Information as Heuristic Search in Belief Space , 2000, AIPS.

[21]  Dylan Hadfield-Menell,et al.  Modular task and motion planning in belief space , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Séverin Lemaignan,et al.  Artificial cognition for social human-robot interaction: An implementation , 2017, Artif. Intell..

[23]  Ville Kyrki,et al.  Active Robot Learning for Temporal Task Models , 2018, 2018 13th ACM/IEEE International Conference on Human-Robot Interaction (HRI).