Cooperative Inverse Reinforcement Learning

For an autonomous system to be helpful to humans and to pose no unwarranted risks, it needs to align its values with those of the humans in its environment in such a way that its actions contribute to the maximization of value for the humans. We propose a formal definition of the value alignment problem as cooperative inverse reinforcement learning (CIRL). A CIRL problem is a cooperative, partial-information game with two agents, human and robot; both are rewarded according to the human's reward function, but the robot does not initially know what this is. In contrast to classical IRL, where the human is assumed to act optimally in isolation, optimal CIRL solutions produce behaviors such as active teaching, active learning, and communicative actions that are more effective in achieving value alignment. We show that computing optimal joint policies in CIRL games can be reduced to solving a POMDP, prove that optimality in isolation is suboptimal in CIRL, and derive an approximate CIRL algorithm.

[1]  N Wiener,et al.  Some moral and technical consequences of automation , 1960, Science.

[2]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[3]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[4]  M. C. Jensen,et al.  Harvard Business School; SSRN; National Bureau of Economic Research (NBER); European Corporate Governance Institute (ECGI); Harvard University - Accounting & Control Unit , 1976 .

[5]  S. Kerr On the folly of rewarding A, while hoping for B. , 1975, Academy of Management journal. Academy of Management.

[6]  Paul R. Milgrom,et al.  AGGREGATION AND LINEARITY IN THE PROVISION OF INTERTEMPORAL INCENTIVES , 1987 .

[7]  Michael Kearns,et al.  On the complexity of teaching , 1991, COLT '91.

[8]  Paul R. Milgrom,et al.  Multitask Principal–Agent Analyses: Incentive Contracts, Asset Ownership, and Job Design , 1991 .

[9]  Ronald L. Rivest,et al.  Learning Binary Relations and Total Orders , 1993, SIAM J. Comput..

[10]  Nancy G. Leveson,et al.  An investigation of the Therac-25 accidents , 1993, Computer.

[11]  R. Gibbons An Introduction to Applicable Game Theory , 1997 .

[12]  R. Gibbons Incentives in Organizations , 1998 .

[13]  Stuart J. Russell Learning agents for uncertain environments (extended abstract) , 1998, COLT' 98.

[14]  Craig Boutilier,et al.  Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[15]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[16]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[17]  J. Tenenbaum,et al.  The Rational Basis of Representativeness , 2001 .

[18]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[19]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[20]  Sriraam Natarajan,et al.  A Decision-Theoretic Model of Assistance , 2007, IJCAI.

[21]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[22]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[23]  Manuel Lopes,et al.  Active Learning for Reward Estimation in Inverse Reinforcement Learning , 2009, ECML/PKDD.

[24]  Thomas Zeugmann,et al.  Recent Developments in Algorithmic Teaching , 2009, LATA.

[25]  Dan Klein,et al.  A Game-Theoretic Approach to Generating Spatial Descriptions , 2010, EMNLP.

[26]  Kristian Kersting,et al.  Multi-Agent Inverse Reinforcement Learning , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[27]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[28]  Kevin Waugh,et al.  Computational Rationalization: The Inverse Equilibrium Problem , 2011, ICML.

[29]  Manuel Lopes,et al.  Algorithmic and Human Teaching of Sequential Decision Tasks , 2012, AAAI.

[30]  Siddhartha S. Srinivasa,et al.  Generating Legible Motion , 2013, Robotics: Science and Systems.

[31]  Ashutosh Nayyar,et al.  Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach , 2012, IEEE Transactions on Automatic Control.

[32]  Volodymyr Kuleshov Inverse Game Theory , 2015 .

[33]  C. Robert Superintelligence: Paths, Dangers, Strategies , 2017 .