Value-Based Policy Teaching with Active Indirect Elicitation

Many situations arise in which an interested party's utility is dependent on the actions of an agent; e.g., a teacher is interested in a student learning effectively and a firm is interested in a consumer's behavior. We consider an environment in which the interested party can provide incentives to affect the agent's actions but cannot otherwise enforce actions. In value-based policy teaching, we situate this within the framework of sequential decision tasks modeled by Markov Decision Processes, and seek to associate limited rewards with states that induce the agent to follow a policy that maximizes the total expected value of the interested party. We show value-based policy teaching is NP-hard and provide a mixed integer program formulation. Focusing in particular on environments in which the agent's reward is unknown to the interested party, we provide a method for active indirect elicitation wherein the agent's reward function is inferred from observations about its response to incentives. Experimental results suggest that we can generally find the optimal incentive provision in a small number of elicitation rounds.

[1]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[2]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[3]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[4]  Daphne Koller,et al.  Making Rational Decisions Using Adaptive Utility Elicitation , 2000, AAAI/IAAI.

[5]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[6]  Daphne Koller,et al.  Learning an Agent's Utility Function by Observing Behavior , 2001, ICML.

[7]  J. Laffont,et al.  The Theory of Incentives: The Principal-Agent Model , 2001 .

[8]  W. N Konings,et al.  The encyclopedia of life support systems , 2003 .

[9]  Moshe Tennenholtz,et al.  k-Implementation , 2003, EC '03.

[10]  Ariel D. Procaccia,et al.  Strategyproof Classification with Shared Inputs , 2009, IJCAI.

[11]  Craig Boutilier,et al.  Regret-based Utility Elicitation in Constraint-based Decision Problems , 2005, IJCAI.

[12]  Krzysztof Z. Gajos,et al.  Preference elicitation for interface optimization , 2005, UIST.

[13]  Scott Shenker,et al.  Hidden-action in multi-hop routing , 2005, EC '05.

[14]  Moshe Babaioff,et al.  Combinatorial agency , 2006, EC '06.

[15]  T. Mulgan The Contract Theory , 2006 .

[16]  H. Varian Revealed Preference , 2006 .

[17]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[18]  David C. Parkes,et al.  Enabling Environment Design via Active Indirect Elicitation , 2008, AAAI 2008.