论文信息 - Efficient Episodic Learning of Nonstationary and Unknown Zero-Sum Games Using Expert Game Ensembles

Efficient Episodic Learning of Nonstationary and Unknown Zero-Sum Games Using Expert Game Ensembles

Game theory provides essential analysis in many applications of strategic interactions. However, the question of how to construct a game model and what is its fidelity is seldom addressed. In this work, we consider learning in a class of repeated zero-sum games with unknown, time-varying payoff matrix, and noisy feedbacks, by making use of an ensemble of benchmark game models. These models can be pre-trained and collected dynamically during sequential plays. They serve as prior side information and imperfectly underpin the unknown true game model. We propose OFULinMat, an episodic learning algorithm that integrates the adaptive estimation of game models and the learning of the strategies. The proposed algorithm is shown to achieve a sublinear bound on the saddle-point regret. We show that this algorithm is provably efficient through both theoretical analysis and numerical examples. We use a dynamic honeypot allocation game as a case study to illustrate and corroborate our results. We also discuss the relationship and highlight the difference between our framework and the classical adversarial multi-armed bandit framework.

Quanyan Zhu | Yunian Pan | Quanyan Zhu | Yunian Pan

[1] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[2] Quanyan Zhu,et al. MASAGE: Model-Agnostic Sequential and Adaptive Game Estimation , 2020, GameSec.

[3] Tor Lattimore,et al. Stochastic matrix games with bandit feedback , 2020, ArXiv.

[4] Michael P. Wellman. Methods for Empirical Game-Theoretic Analysis , 2006, AAAI.

[5] Ambuj Tewari,et al. Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles , 2019, AISTATS.

[6] Csaba Szepesvári,et al. Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems , 2011, ArXiv.

[7] Felipe Caro,et al. Robust control of the multi-armed bandit problem , 2014, Annals of Operations Research.

[8] Quanyan Zhu,et al. A Game-theoretic Taxonomy and Survey of Defensive Deception for Cybersecurity and Privacy , 2017, ACM Comput. Surv..

[9] Mitsuo Kawato,et al. Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[10] Quanyan Zhu,et al. Game Theory for Cyber Deception , 2021, Static & Dynamic Game Theory: Foundations & Applications.

[11] Maryam Kamgarpour,et al. Contextual Games: Multi-Agent Learning with Side Information , 2021, NeurIPS.

[12] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[13] Quanyan Zhu,et al. Game theory meets network security and privacy , 2013, CSUR.