Multi-view Sequential Games: The Helper-Agent Problem

Problems where agents wish to cooperate for a common goal, but disagree on their view of reality are frequent. Of particular interest are settings where one agent is an AI ``helper agent'' and the other is a human. The AI wants to help the human to complete a task but the AI and human may disagree about the world model. This may come about for example because of the limited rationality and biases of the human or because of misaligned reward models. In this paper, we formalize this as the multi-view sequential game, and show that even when the human's model is far from correct, an AI can still steer their behavior to more beneficial outcomes. In particular, we develop a number of algorithms, based on dynamic programming to discover helper policies for the AI, under different assumptions about the AI's knowledge. Experimentally, we show that the AI's beliefs about human model are not required to be accurate in order to act as a useful helper agent.

[1]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[2]  Ya'akov Gal,et al.  Networks of Influence Diagrams: A Formalism for Representing Agents' Beliefs and Decision-Making Processes , 2008, J. Artif. Intell. Res..

[3]  Branislav Bosanský,et al.  Computation of Stackelberg Equilibria of Finite Sequential Games , 2015, WINE.

[4]  D. Lindley On a Measure of the Information Provided by an Experiment , 1956 .

[5]  Michael L. Littman,et al.  Cyclic Equilibria in Markov Games , 2005, NIPS.

[6]  Anca D. Dragan,et al.  Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[7]  Richard L. Lewis,et al.  Internal Rewards Mitigate Agent Boundedness , 2010, ICML.

[8]  Peter Auer,et al.  Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[9]  David C. Parkes,et al.  Value-Based Policy Teaching with Active Indirect Elicitation , 2008, AAAI.

[10]  Ofra Amir,et al.  Interactive Teaching Strategies for Agent Training , 2016, IJCAI.

[11]  Yishay Mansour,et al.  Approximate Equivalence of Markov Decision Processes , 2003, COLT.

[12]  Michael P. Wellman,et al.  Gradient methods for stackelberg security games , 2016, UAI 2016.

[13]  David C. Parkes,et al.  Policy teaching through reward function learning , 2009, EC '09.

[14]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[15]  John M Gozzolino,et al.  MARKOVIAN DECISION PROCESSES WITH UNCERTAIN TRANSITION PROBABILITIES , 1965 .

[16]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .