Pragmatic-Pedagogic Value Alignment

As intelligent systems gain autonomy and capability, it becomes vital to ensure that their objectives match those of their human users; this is known as the value-alignment problem. In robotics, value alignment is key to the design of collaborative robots that can integrate into human workflows, successfully inferring and adapting to their users' objectives as they go. We argue that a meaningful solution to value alignment must combine multi-agent decision theory with rich mathematical models of human cognition, enabling robots to tap into people's natural collaborative capabilities. We present a solution to the cooperative inverse reinforcement learning (CIRL) dynamic game based on well-established cognitive models of decision making and theory of mind. The solution captures a key reciprocity relation: the human will not plan her actions in isolation, but rather reason pedagogically about how the robot might learn from them; the robot, in turn, can anticipate this and interpret the human's actions pragmatically. To our knowledge, this work constitutes the first formal analysis of value alignment grounded in empirically validated cognitive models.

[1]  F. Heider,et al.  An experimental study of apparent behavior , 1944 .

[2]  T. Schelling The Strategy of Conflict , 1963 .

[3]  R. Luce,et al.  Individual Choice Behavior: A Theoretical Analysis. , 1960 .

[4]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[5]  R. Duncan Luce,et al.  Individual Choice Behavior: A Theoretical Analysis , 1979 .

[6]  Tamer Basar,et al.  Feedback Equilibria in Differential Games with Structural and Modal Uncertainties , 1982 .

[7]  A. Meltzoff Understanding the Intentions of Others: Re-Enactment of Intended Acts by 18-Month-Old Children. , 1995, Developmental psychology.

[8]  Eric Allender,et al.  Complexity of finite-horizon Markov decision process problems , 2000, JACM.

[9]  Shmuel Zamir,et al.  Bayesian Games: Games with Incomplete Information , 2008, Encyclopedia of Complexity and Systems Science.

[10]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[11]  Chris L. Baker,et al.  Modeling Human Plan Recognition Using Bayesian Theory of Mind , 2014 .

[12]  Noah D. Goodman,et al.  A rational account of pedagogical reasoning: Teaching by, and learning from, examples , 2014, Cognitive Psychology.

[13]  Siddhartha S. Srinivasa,et al.  Integrating human observer inferences into robot motion planning , 2014, Auton. Robots.

[14]  Anca D. Dragan,et al.  Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[15]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[16]  Anca D. Dragan,et al.  An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning , 2018, ICML.