论文信息 - Exploiting exploration strategies in repeated normal form security games

Exploiting exploration strategies in repeated normal form security games

We describe a method for the estimation of an opponent's utility matrix in a finite repeated game, given that he selects his actions by a known deterministic algorithm with some unknown parameters. We also investigate the prediction, based on the utility matrix estimate, of this opponent's future actions, and a simple method by which the opponent can counter these efforts. These are posed as a feasibility problem and a convex optimization problem, respectively. Simulation results are also presented.

George Cybenko | James Thomas House

[1] Ali E. Abbas,et al. Maximum Entropy Utility , 2004, Oper. Res..

[2] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .

[3] E. Rasmussen. Games and Information , 1989 .

[4] Yoav Shoham,et al. Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[5] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .

[6] H P Young,et al. On the impossibility of predicting the behavior of rational agents , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .

[8] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[9] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[10] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[11] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[12] E. Jaynes. On the rationale of maximum-entropy methods , 1982, Proceedings of the IEEE.