Selecting Operator Queries Using Expected Myopic Gain

When its human operator cannot continuously supervise (much less teleoperate) an agent, the agent should be able to recognize its limitations and ask for help when it risks making autonomous decisions that could significantly surprise and disappoint the operator. Inspired by previous research on making exploration-exploitation tradeoff decisions and on inverse reinforcement learning, we develop Expected Myopic Gain (EMG), a Bayesian approach where an agent explicitly models its uncertainty and how possible operator responses to queries could improve its decisions. With EMG, an agent can weigh the relative expected utilities of seeking operator help versus acting autonomously. We provide conditions under which EMG is optimal, and preliminary empirical results on simple domains showing that EMG can perform well even when its optimality conditions are violated.

[1]  David Andre,et al.  Model based Bayesian Exploration , 1999, UAI.

[2]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[3]  Michael A. Goodrich,et al.  Experiments in adjustable autonomy , 2001, 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236).

[4]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[5]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[6]  Michael O. Duff,et al.  Design for an Optimal Probe , 2003, ICML.

[7]  Manuel Lopes,et al.  Active Learning for Reward Estimation in Inverse Reinforcement Learning , 2009, ECML/PKDD.

[8]  Manuela M. Veloso,et al.  Interactive Policy Learning through Confidence-Based Autonomy , 2014, J. Artif. Intell. Res..

[9]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.