Bayesian Persuasion in Sequential Decision-Making

We study a dynamic model of Bayesian persuasion in sequential decision-making settings. An informed principal observes an external parameter of the world and advises an uninformed agent about actions to take over time. The agent takes actions in each time step based on the current state, the principal’s advice/signal, and beliefs about the external parameter. The action of the agent updates the state according to a stochastic process. The model arises naturally in many applications, e.g., an app (the principal) can advice the user (the agent) on possible choices between actions based on additional real-time information the app has. We study the problem of designing a signaling strategy from the principal’s point of view. We show that the principal has an optimal strategy against a myopic agent, who only optimizes their rewards locally, and the optimal strategy can be computed in polynomial time. In contrast, it is NP-hard to approximate an optimal policy against a far-sighted agent. Further, we show that if the principal has the power to threaten the agent by not providing future signals, then we can efficiently design a threat-based strategy. This strategy guarantees the principal’s payoff as if playing against an agent who is far-sighted but myopic to future signals.

[1]  Quanyan Zhu,et al.  Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals , 2019, GameSec.

[2]  David C. Parkes,et al.  Value-Based Policy Teaching with Active Indirect Elicitation , 2008, AAAI.

[3]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[4]  Nicolas Vieille,et al.  Optimal dynamic information provision , 2014, Games Econ. Behav..

[5]  Matthew E. Taylor,et al.  Teaching on a budget: agents advising agents in reinforcement learning , 2013, AAMAS.

[6]  Emir Kamenica,et al.  Bayesian Persuasion and Information Design , 2019, Annual Review of Economics.

[7]  Xiaojin Zhu,et al.  Policy Teaching in Reinforcement Learning via Environment Poisoning Attacks , 2020, J. Mach. Learn. Res..

[8]  Vincent Conitzer,et al.  Computing Optimal Strategies to Commit to in Stochastic Games , 2012, AAAI.

[9]  Yishay Mansour,et al.  Bayesian Exploration: Incentivizing Exploration in Bayesian Games , 2016, EC.

[10]  Bart De Schutter,et al.  Multi-agent Reinforcement Learning: An Overview , 2010 .

[11]  Sandra Zilles,et al.  An Overview of Machine Teaching , 2018, ArXiv.

[12]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[13]  Tamer Basar,et al.  Non-Cooperative Inverse Reinforcement Learning , 2019, NeurIPS.

[14]  Vincent Conitzer,et al.  Computing optimal strategies to commit to in extensive-form games , 2010, EC '10.

[15]  Nicola Gatti,et al.  Persuading Voters: It's Easy to Whisper, It's Hard to Speak Loud , 2020, AAAI.

[16]  Haifeng Xu,et al.  Information Disclosure as a Means to Security , 2015, AAMAS.

[17]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[18]  David C. Parkes,et al.  Policy teaching through reward function learning , 2009, EC '09.

[19]  Hans Ulrich Simon,et al.  Recursive teaching dimension, VC-dimension and sample compression , 2014, J. Mach. Learn. Res..

[20]  Ofra Amir,et al.  Interactive Teaching Strategies for Agent Training , 2016, IJCAI.

[21]  Andreas Krause,et al.  Near-Optimally Teaching the Crowd to Classify , 2014, ICML.

[22]  John N. Tsitsiklis,et al.  Introduction to linear optimization , 1997, Athena scientific optimization and computation series.

[23]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[24]  Alberto Marchesi,et al.  Online Bayesian Persuasion , 2020, NeurIPS.

[25]  Michael Kearns,et al.  On the complexity of teaching , 1991, COLT '91.

[26]  Lakhmi C. Jain,et al.  Innovations in Multi-Agent Systems and Applications - 1 , 2010 .

[27]  Haifeng Xu,et al.  Targeting and Signaling in Ad Auctions , 2017, SODA.

[28]  Haifeng Xu,et al.  Exploring Information Asymmetry in Two-Stage Security Games , 2015, AAAI.

[29]  Emir Kamenica,et al.  Bayesian Persuasion , 2009 .

[30]  Xiaojin Zhu,et al.  Policy Poisoning in Batch Reinforcement Learning and Control , 2019, NeurIPS.

[31]  Xiaojin Zhu,et al.  Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning , 2020, ICML.

[32]  Itay Goldstein,et al.  Stress Tests and Information Disclosure , 2017, J. Econ. Theory.

[33]  Anca D. Dragan,et al.  Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[34]  DughmiShaddin Algorithmic information structure design , 2017 .

[35]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.