Learning to Persuade on the Fly: Robustness Against Ignorance

We study a repeated persuasion setting between a sender and a receiver, where at each time t, the sender shares information about a payoff-relevant state with the receiver. The state at each time t is drawn independently and identically from an unknown distribution, and subsequent to receiving information about it, the receiver (myopically) chooses an action from a finite set. The sender seeks to persuade the receiver into choosing actions that are aligned with her preference by selectively sharing information about the state. In contrast to the standard persuasion setting, we focus on the case where neither the sender nor the receiver knows the distribution of the payoff relevant state. Instead, the sender learns this distribution over time by observing the state realizations. We adopt the assumption common in the literature on Bayesian persuasion that at each time period, prior to observing the realized state in that period, the sender commits to a signaling mechanism that maps each state to a possibly random action recommendation. Subsequent to the state observation, the sender recommends an action as per the chosen signaling mechanism.

[1]  Haifeng Xu,et al.  Algorithmic Persuasion with No Externalities , 2017, EC.

[2]  Yiling Chen,et al.  Learning Strategy-Aware Linear Classifiers , 2019, NeurIPS.

[3]  Yishay Mansour,et al.  Implementing the “Wisdom of the Crowd” , 2013, Journal of Political Economy.

[4]  A. Pavan,et al.  Preparing for the Worst but Hoping for the Best: Robust (Bayesian) Persuasion , 2020, SSRN Electronic Journal.

[5]  Tim Roughgarden,et al.  Revenue maximization with a single sample , 2010, EC '10.

[6]  Emir Kamenica,et al.  Bayesian Persuasion , 2009 .

[7]  Haifeng Xu,et al.  Algorithmic Bayesian persuasion , 2015, STOC.

[8]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[9]  Svetlana Kosterina Persuasion with unknown beliefs , 2022, Theoretical Economics.

[10]  Yishay Mansour,et al.  Bayesian Incentive-Compatible Bandit Exploration , 2015, EC.

[11]  Jason D. Hartline,et al.  Mechanisms for a No-Regret Agent: Beyond the Common Prior , 2020, 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS).

[12]  Xiaohan Wei,et al.  Online Convex Optimization with Stochastic Constraints , 2017, NIPS.

[13]  Yishay Mansour,et al.  Bayesian Exploration: Incentivizing Exploration in Bayesian Games , 2016, EC.

[14]  Alberto Marchesi,et al.  Online Bayesian Persuasion , 2020, NeurIPS.

[15]  Shaddin Dughmi,et al.  Algorithmic information structure design: a survey , 2017, SECO.

[16]  Rong Jin,et al.  Trading regret for efficiency: online convex optimization with long term constraints , 2011, J. Mach. Learn. Res..

[17]  Jérôme Renault,et al.  Repeated Games with Incomplete Information , 2009, Encyclopedia of Complexity and Systems Science.

[18]  Lilian Besson,et al.  What Doubling Tricks Can and Can't Do for Multi-Armed Bandits , 2018, ArXiv.

[19]  Xi Weng,et al.  Robust persuasion of a privately informed receiver , 2018, Economic Theory.

[20]  Rann Smorodinsky,et al.  Prophet Inequalities for Bayesian Persuasion , 2020, IJCAI.

[21]  Frank Thomson Leighton,et al.  The value of knowing a demand curve: bounds on regret for online posted-price auctions , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[22]  Rann Smorodinsky,et al.  The Secretary Recommendation Problem , 2020, EC.

[23]  Shuchi Chawla,et al.  Prior-independent mechanisms for scheduling , 2013, STOC '13.

[24]  D. Bergemann,et al.  Bayes Correlated Equilibrium and the Comparison of Information Structures in Games , 2013 .

[25]  Aaron Roth,et al.  Strategic Classification from Revealed Preferences , 2017, EC.

[26]  Jianjun Yuan,et al.  Online Convex Optimization for Cumulative Constraints , 2018, NeurIPS.

[27]  Holger Rauhut,et al.  A Mathematical Introduction to Compressive Sensing , 2013, Applied and Numerical Harmonic Analysis.

[28]  Krishnamurthy Iyer,et al.  Persuading Risk-Conscious Agents: A Geometric Approach , 2019, SSRN Electronic Journal.

[29]  Hao Yu,et al.  A Low Complexity Algorithm with O(√T) Regret and O(1) Constraint Violations for Online Convex Optimization with Long Term Constraints , 2020, J. Mach. Learn. Res..

[30]  Maria-Florina Balcan,et al.  Commitment Without Regrets: Online Learning in Stackelberg Security Games , 2015, EC.

[31]  D. Bergemann,et al.  Information Design: A Unified Perspective , 2017, Journal of Economic Literature.