Agent Incentives: A Causal Perspective

We present a framework for analysing agent incentives using causal influence diagrams. We establish that a well-known criterion for value of information is complete. We propose a new graphical criterion for value of control, establishing its soundness and completeness. We also introduce two new concepts for incentive analysis: response incentives indicate which changes in the environment affect an optimal decision, while instrumental control incentives establish whether an agent can influence its utility via a variable X. For both new concepts, we provide sound and complete graphical criteria. We show by example how these results can help with evaluating the safety and fairness of an AI system.

[1]  Steffen L. Lauritzen,et al.  Representing and Solving Decision Problems with Limited Information , 2001, Manag. Sci..

[2]  Marcus Hutter,et al.  Asymptotically Unambitious Artificial General Intelligence , 2019, AAAI.

[3]  Ilya Shpitser,et al.  Learning Optimal Fair Policies , 2018, ICML.

[4]  Shane Legg,et al.  Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings , 2019, ArXiv.

[5]  Charles Kemp,et al.  Capturing mental state reasoning with influence diagrams , 2011, CogSci.

[6]  Bernhard Schölkopf,et al.  Avoiding Discrimination through Causal Reasoning , 2017, NIPS.

[7]  Matt J. Kusner,et al.  Counterfactual Fairness , 2017, NIPS.

[8]  Ross D. Shachter Decisions and Dependence in Influence Diagrams , 2016, Probabilistic Graphical Models.

[9]  Jin Tian,et al.  Causal Discovery from Changes , 2001, UAI.

[10]  Michael Wooldridge,et al.  Equilibrium Refinements for Multi-Agent Influence Diagrams: Theory and Practice , 2021, AAMAS.

[11]  R. Scheines,et al.  Interventions and Causal Inference , 2007, Philosophy of Science.

[12]  James Babcock,et al.  Artificial General Intelligence , 2016, Lecture Notes in Computer Science.

[13]  Elias Bareinboim,et al.  Causal Imitation Learning With Unobserved Confounders , 2022, NeurIPS.

[14]  Koen Holtman,et al.  Towards AGI Agent Safety by Iteratively Improving the Utility Function , 2020, AGI.

[15]  Ilya Shpitser,et al.  Fair Inference on Outcomes , 2017, AAAI.

[16]  Tom Everitt,et al.  How RL Agents Behave When Their Actions Are Modified , 2021, AAAI.

[17]  Lu Zhang,et al.  A Causal Framework for Discovering and Removing Direct and Indirect Discrimination , 2016, IJCAI.

[18]  Marcus Hutter,et al.  Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective , 2019, Synthese.

[19]  Ross D. Shachter,et al.  Pearl Causality and the Value of Control , 2016 .

[20]  Anca D. Dragan,et al.  The Off-Switch Game , 2016, IJCAI.

[21]  C. Robert Superintelligence: Paths, Dangers, Strategies , 2017 .

[22]  Judea Pearl,et al.  Direct and Indirect Effects , 2001, UAI.

[23]  Shane Legg,et al.  The Incentives that Shape Behaviour , 2020, ArXiv.

[24]  Stuart Russell Human Compatible: Artificial Intelligence and the Problem of Control , 2019 .

[25]  Ronald A. Howard,et al.  Information Value Theory , 1966, IEEE Trans. Syst. Sci. Cybern..

[26]  Enrico Fagiuoli,et al.  A note about redundancy in influence diagrams , 1998, Int. J. Approx. Reason..

[27]  Thomas D. Nielsen,et al.  Welldefined Decision Scenarios , 1999, UAI.

[28]  Elias Bareinboim,et al.  Characterizing Optimal Mixed Policies: Where to Intervene and What to Observe , 2020, NeurIPS.

[29]  Ross D. Shachter Evaluating Influence Diagrams , 1986, Oper. Res..

[30]  Ross D. Shachter Bayes-Ball: The Rational Pastime (for Determining Irrelevance and Requisite Information in Belief Networks and Influence Diagrams) , 1998, UAI.

[31]  Judea Pearl,et al.  Causal networks: semantics and expressiveness , 2013, UAI.

[32]  Laurent Orseau,et al.  Pitfalls of learning a reward function online , 2020, IJCAI.

[33]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[34]  Ross D. Shachter,et al.  A Decision-based View of Causality , 1994, UAI.

[35]  A. Dawid Influence Diagrams for Causal Modelling and Inference , 2002 .

[36]  Joshua B. Tenenbaum,et al.  Inference of Intention and Permissibility in Moral Decision Making , 2015, CogSci.

[37]  Judea Pearl,et al.  Axioms of Causal Relevance , 1997, Artif. Intell..

[38]  Thomas S. Woodson Weapons of math destruction , 2018, Journal of Responsible Innovation.

[39]  Elias Bareinboim,et al.  A Calculus for Stochastic Interventions: Causal Effect Identification and Surrogate Experiments , 2020, AAAI.

[40]  Ramana Kumar,et al.  Modeling AGI Safety Frameworks with Causal Influence Diagrams , 2019, AISafety@IJCAI.

[41]  Stephen M. Omohundro,et al.  The Basic AI Drives , 2008, AGI.

[42]  Ross D. Shachter,et al.  Decision-Theoretic Foundations for Causal Reasoning , 1995, J. Artif. Intell. Res..