Enabling Environment Design via Active Indirect Elicitation

Many situations arise in which an interested party wishes to affect the decisions of an agent; e.g., a teacher that seeks to promote particular study habits, a Web 2.0 site that seeks to encourage users to contribute content, or an online retailer that seeks to encourage consumers to write reviews. In the problem of environment design, one assumes an interested party who is able to alter limited aspects of the environment for the purpose of promoting desirable behaviors. A critical aspect of environment design is understanding preferences, but by assumption direct queries are unavailable. We work in the inverse reinforcement learning framework, adopting here the idea of active indirect preference elicitation to learn the reward function of the agent by observing behavior in response to incentives. We show that the process is convergent and obtain desirable bounds on the number of elicitation rounds. We briefly discuss generalizations of the elicitation method to other forms of environment design, e.g., modifying the state space, transition model, and available actions.

[1]  Daphne Koller,et al.  Learning an Agent's Utility Function by Observing Behavior , 2001, ICML.

[2]  Ariel D. Procaccia,et al.  Strategyproof Classification with Shared Inputs , 2009, IJCAI.

[3]  Yoav Shoham,et al.  Combinatorial Auctions , 2005, Encyclopedia of Wireless Networks.

[4]  Craig Boutilier,et al.  Eliciting Bid Taker Non-price Preferences in (Combinatorial) Auctions , 2004, AAAI.

[5]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[6]  Tuomas Sandholm,et al.  Preference elicitation in combinatorial auctions , 2001, AAMAS '02.

[7]  Krzysztof Z. Gajos,et al.  Preference elicitation for interface optimization , 2005, UIST.

[8]  David C. Parkes,et al.  Value-Based Policy Teaching with Active Indirect Elicitation , 2008, AAAI.

[9]  H. Varian Revealed Preference , 2006 .

[10]  Luis Rademacher,et al.  Approximating the centroid is hard , 2007, SCG '07.

[11]  Craig Boutilier,et al.  A POMDP formulation of preference elicitation problems , 2002, AAAI/IAAI.

[12]  Tuomas Sandholm,et al.  Preference elicitation in combinatorial auctions , 2002, EC '01.

[13]  B. Grünbaum Partitions of mass-distributions and of convex bodies by hyperplanes. , 1960 .

[14]  Daphne Koller,et al.  Making Rational Decisions Using Adaptive Utility Elicitation , 2000, AAAI/IAAI.

[15]  Craig Boutilier,et al.  Regret-based Utility Elicitation in Constraint-based Decision Problems , 2005, IJCAI.

[16]  Santosh S. Vempala,et al.  Solving convex programs by random walks , 2004, JACM.

[17]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[18]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.