Impossibility of deducing preferences and rationality from human policy

Inverse reinforcement learning (IRL) attempts to infer human rewards or preferences from observed behavior. However, human planning systematically deviates from rationality. Though there has been some IRL work which assumes humans are noisily rational, there has been little analysis of the general problem of inferring the reward of a human of unknown rationality. The observed behavior can, in principle, be decomposed into two components: a reward function and a planning algorithm that maps reward function to policy. Both of these variables have to be inferred from behaviour. This paper presents a "No Free Lunch" theorem in this area, showing that, without making `normative' assumptions beyond the data, nothing about the human reward function can be deduced from human behaviour. Unlike most No Free Lunch theorems, this cannot be alleviated by regularising with simplicity assumptions. The simplest hypotheses are generally degenerate. The paper will then sketch how one might begin to use normative assumptions to get around the problem, without which solving the general IRL problem is impossible. The reward function-planning algorithm formalism can also be used to encode what it means for an agent to manipulate or override human preferences.

[1]  P. Samuelson Consumption Theory in Terms of Revealed Preference , 1948 .

[2]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[3]  J. G. Miller Culture and development of everyday social explanation. , 1984, Journal of personality and social psychology.

[4]  Leonid A. Levin,et al.  Randomness Conservation Inequalities; Information and Independence in Mathematical Theories , 1984, Inf. Control..

[5]  Hans P. Moravec Mind Children: The Future of Robot and Human Intelligence , 1988 .

[6]  Ming Li,et al.  Kolmogorov Complexity and its Applications , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[7]  Cristian S. Calude Information and Randomness: An Algorithmic Perspective , 1994 .

[8]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[9]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[10]  I. Brocas,et al.  The psychology of economic decisions , 2003 .

[11]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[12]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[13]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[14]  Colin Camerer,et al.  Neuroeconomics: decision making and the brain , 2008 .

[15]  Jan Peters,et al.  Relative Entropy Inverse Reinforcement Learning , 2011, AISTATS.

[16]  Stuart J. Russell,et al.  Rationality and Intelligence: A Brief Update , 2013, PT-AI.

[17]  Tor Lattimore,et al.  Free Lunch for optimisation under the universal distribution , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[18]  Nick Bostrom,et al.  Superintelligence: Paths, Dangers, Strategies , 2014 .

[19]  Satinder Singh,et al.  Computational Rationality: Linking Mechanism and Behavior Through Bounded Utility Maximization , 2014, Top. Cogn. Sci..

[20]  Irene Scopelliti,et al.  Bias Blind Spot: Structure, Measurement, and Consequences , 2015, Manag. Sci..

[21]  Thomas L. Griffiths,et al.  Rational Use of Cognitive Resources: Levels of Analysis Between the Computational and the Algorithmic , 2015, Top. Cogn. Sci..

[22]  Owain Evans Learning the Preferences of Bounded Agents , 2015 .

[23]  Stuart J. Russell,et al.  Research Priorities for Robust and Beneficial Artificial Intelligence , 2015, AI Mag..

[24]  Peter Dayan,et al.  Monte Carlo Planning Method Estimates Planning Horizons during Interactive Social Exchange , 2015, PLoS Comput. Biol..

[25]  Anca D. Dragan,et al.  Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[26]  Noah D. Goodman,et al.  Learning the Preferences of Ignorant, Inconsistent Agents , 2015, AAAI.

[27]  Marcus Hutter,et al.  Avoiding Wireheading with Value Reinforcement Learning , 2016, AGI.

[28]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[29]  Anca D. Dragan,et al.  Should Robots be Obedient? , 2017, IJCAI.

[30]  Anca D. Dragan,et al.  Inverse Reward Design , 2017, NIPS.

[31]  Shimon Whiteson,et al.  Rapidly exploring learning trees , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Laurent Orseau,et al.  AI Safety Gridworlds , 2017, ArXiv.

[33]  Laurent Orseau,et al.  Reinforcement Learning with a Corrupted Reward Channel , 2017, IJCAI.

[34]  Dorsa Sadigh,et al.  Maximizing Road Capacity Using Cars that Influence People , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[35]  Daniel Filan,et al.  Exploring Hierarchy-Aware Inverse Reinforcement Learning , 2018, ArXiv.

[36]  Anca D. Dragan,et al.  Inferring Reward Functions from Demonstrators with Unknown Biases , 2018 .