Servant of Many Masters: Shifting priorities in Pareto-optimal sequential decision-making

It is often argued that an agent making decisions on behalf of two or more principals who have different utility functions should adopt a {\em Pareto-optimal} policy, i.e., a policy that cannot be improved upon for one agent without making sacrifices for another. A famous theorem of Harsanyi shows that, when the principals have a common prior on the outcome distributions of all policies, a Pareto-optimal policy for the agent is one that maximizes a fixed, weighted linear combination of the principals' utilities. In this paper, we show that Harsanyi's theorem does not hold for principals with different priors, and derive a more precise generalization which does hold, which constitutes our main result. In this more general case, the relative weight given to each principal's utility should evolve over time according to how well the agent's observations conform with that principal's prior. The result has implications for the design of contracts, treaties, joint ventures, and robots.

[1]  J. Nash THE BARGAINING PROBLEM , 1950, Classics in Game Theory.

[2]  J. Harsanyi Cardinal Welfare, Individualistic Ethics, and Interpersonal Comparisons of Utility , 1955 .

[3]  R. Myerson Incentive Compatibility and the Bargaining Problem , 1979 .

[4]  M. Satterthwaite,et al.  Efficient Mechanisms for Bilateral Trading , 1983 .

[5]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[6]  Csaba Szepesvári,et al.  Multi-criteria Reinforcement Learning , 1998, ICML.

[7]  Stuart J. Russell Learning agents for uncertain environments (extended abstract) , 1998, COLT' 98.

[8]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[9]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[10]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[11]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[12]  Adnan Darwiche,et al.  Modeling and Reasoning with Bayesian Networks , 2009 .

[13]  Yiannis Demiris,et al.  Evolving policies for multi-reward partially observable markov decision processes (MR-POMDPs) , 2011, GECCO '11.

[14]  Julie A. Shah,et al.  Fairness in Multi-Agent Sequential Decision-Making , 2014, NIPS.

[15]  Nick Bostrom,et al.  Superintelligence: Paths, Dangers, Strategies , 2014 .

[16]  Weijia Wang Multi-objective sequential decision making , 2014 .

[17]  Shimon Whiteson,et al.  Point-Based Planning for Multi-Objective POMDPs , 2015, IJCAI.

[18]  Anca D. Dragan,et al.  Cooperative Inverse Reinforcement Learning , 2016, NIPS.