On the Equilibrium Elicitation of Markov Games Through Information Design

This work considers a novel information design problem and studies how the craft of payoff-relevant environmental signals solely can influence the behaviors of intelligent agents. The agents’ strategic interactions are captured by an incomplete-information Markov game, in which each agent first selects one environmental signal from multiple signal sources as additional payoff-relevant information and then takes an action. There is a rational information designer (designer) who possesses one signal source and aims to control the equilibrium behaviors of the agents by designing the information structure of her signals sent to the agents. An obedient principle is established which states that it is without loss of generality to focus on the direct information design when the information design incentivizes each agent to select the signal sent by the designer, such that the design process avoids the predictions of the agents’ strategic selection behaviors. We then introduce the design protocol given a goal of the designer referred to as obedient implementability (OIL) and characterize the OIL in a class of obedient perfect Bayesian Markov Nash equilibria (O-PBME). A new framework for information design is proposed based on an approach of maximizing the optimal slack variables. Finally, we formulate the designer’s goal selection problem and characterize it in terms of information design by establishing a relationship between the O-PBME and the Bayesian Markov correlated equilibria, in which we build upon the revelation principle in classic information design in economics. The proposed approach can be applied to elicit desired behaviors of multi-agent systems in competing as well as cooperating settings and be extended to heterogeneous stochastic games in the completeand the incomplete-information environments.

[1]  Jeffrey C. Ely,et al.  Sequential Information Design , 2020, Econometrica.

[2]  Emir Kamenica,et al.  Bayesian Persuasion and Information Design , 2019, Annual Review of Economics.

[3]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[4]  S. Zamir,et al.  Formulation of Bayesian analysis for games with incomplete information , 1985 .

[5]  Shai Halevi,et al.  A Cryptographic Solution to a Game Theoretic Problem , 2000, CRYPTO.

[6]  D. Bergemann,et al.  Information Design: A Unified Perspective , 2017, Journal of Economic Literature.

[7]  Matan Tsur,et al.  Information design in competitive insurance markets , 2020, J. Econ. Theory.

[8]  I. Segal,et al.  Dynamic Mechanism Design: A Myersonian Approach , 2014 .

[9]  Emir Kamenica,et al.  Bayesian Persuasion , 2009 .

[10]  Jerzy A. Filar,et al.  Competitive Markov Decision Processes - Theory, Algorithms, and Applications , 1997 .

[11]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[12]  Roger B. Myerson,et al.  Optimal Auction Design , 1981, Math. Oper. Res..

[13]  Michael H. Bowling,et al.  Apprenticeship learning using linear programming , 2008, ICML '08.

[14]  Itay Goldstein,et al.  Stress Tests and Information Disclosure , 2017, J. Econ. Theory.

[15]  Anca D. Dragan,et al.  Inverse Reward Design , 2017, NIPS.

[16]  M. Szydlowski Optimal Financing and Disclosure , 2016, Manag. Sci..

[17]  Stephen Morris,et al.  Bayes Correlated Equilibrium and the Comparison of Information Structures in Games , 2015 .

[18]  Penélope Hernández,et al.  How Bayesian Persuasion Can Help Reduce Illegal Parking and Other Socially Undesirable Behavior , 2022, American Economic Journal: Microeconomics.

[19]  Miltiadis Makris,et al.  Information design in multistage games , 2018, Theoretical Economics.

[20]  Quanyan Zhu,et al.  On Incentive Compatibility in Dynamic Mechanism Design With Exit Option in a Markovian Environment , 2019, Dyn. Games Appl..

[21]  Yakov Babichenko,et al.  Private Bayesian Persuasion , 2019, J. Econ. Theory.

[22]  Jeffrey C. Ely,et al.  Suspense and Surprise , 2015, Journal of Political Economy.

[23]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[24]  J. Harsanyi Games with Incomplete Information Played by “Bayesian” Players Part II. Bayesian Equilibrium Points , 1968 .

[25]  Tristan Tomala,et al.  Interactive Information Design , 2018, Math. Oper. Res..

[26]  K. Sonin,et al.  Government Control of the Media , 2014 .

[27]  Sanmay Das,et al.  Reducing congestion through information design , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[28]  Laurent Mathevet,et al.  On Information Design in Games , 2020, Journal of Political Economy.

[29]  Richard Baskerville,et al.  Information design , 2011, Eur. J. Inf. Syst..

[30]  M. Utku Ünver,et al.  Matching, Allocation, and Exchange of Discrete Resources , 2009 .

[31]  Alberto Marchesi,et al.  Online Bayesian Persuasion , 2020, NeurIPS.

[32]  A. Pavan,et al.  Persuasion in Global Games with Application to Stress Testing ⇤ , 2017 .

[33]  Sujit Gujar,et al.  An optimal bidimensional multi-armed bandit auction for multi-unit procurement , 2018, Annals of Mathematics and Artificial Intelligence.

[34]  Achyuthan Unni Krishnan,et al.  Reward Engineering for Object Pick and Place Training , 2020, ArXiv.

[35]  D. Duffie,et al.  Benchmarks in Search Markets , 2014 .

[36]  Isabelle Brocas,et al.  Influence through ignorance , 2007 .

[37]  Paul Milgrom,et al.  Putting Auction Theory to Work , 2004 .

[38]  Optimal Two-Sided Market Mechanism Design for Large-Scale Data Sharing and Trading in Massive IoT Networks , 2019, ArXiv.

[39]  Daniel Dewey,et al.  Reinforcement Learning and the Reward Engineering Principle , 2014, AAAI Spring Symposia.

[40]  Juan Pablo Xandri,et al.  Robust Conditional Predictions in Dynamic Games: An Application to Sovereign Debt , 2014 .

[41]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[42]  S. Morris,et al.  The Robustness of Equilibria to Incomplete Information , 1997 .

[43]  John C. Harsanyi,et al.  Games with Incomplete Information Played by "Bayesian" Players, I-III: Part I. The Basic Model& , 2004, Manag. Sci..

[44]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[45]  Haifeng Xu,et al.  Information Disclosure as a Means to Security , 2015, AAMAS.

[46]  Anind K. Dey,et al.  Maximum Causal Entropy Correlated Equilibria for Markov Games , 2011, Interactive Decision Theory and Game Theory.

[47]  W. Fleming Book Review: Discrete-time Markov control processes: Basic optimality criteria , 1997 .

[48]  D. Bergemann,et al.  Bayes Correlated Equilibrium and the Comparison of Information Structures in Games , 2013 .

[49]  Quanyan Zhu,et al.  On the Differential Private Data Market: Endogenous Evolution, Dynamic Pricing, and Incentive Compatibility , 2021, ArXiv.

[50]  Bruce Bueno de Mesquita,et al.  An Introduction to Game Theory , 2014 .

[51]  A. Dickinson Actions and habits: the development of behavioural autonomy , 1985 .