论文信息 - Synthesis of strategies from interaction traces

Synthesis of strategies from interaction traces

We describe how to take a set of interaction traces produced by different pairs of players in a two-player repeated game, and combine them into a composite strategy. We provide an algorithm that, in polynomial time, can generate the best such composite strategy. We describe how to incorporate the composite strategy into an existing agent, as an enhancement of the agent's original strategy. We provide experimental results using interaction traces from 126 agents (most of them written by students as class projects) for the Iterated Prisoner's Dilemma, Iterated Chicken Game, and Iterated Battle of the Sexes. We compared each agent with the enhanced version of that agent produced by our algorithm. The enhancements improved the agents' scores by about 5% in the IPD, 11% in the ICG, and 26% in the IBS, and improved their rank by about 12% in the IPD, 38% in the ICG, and 33% in the IBS.

Sarit Kraus | Dana S. Nau | Tsz-Chiu Au | Sarit Kraus | T. Au

[1] Jonathan Schaeffer,et al. Approximating Game-Theoretic Optimal Strategies for Full-scale Poker , 2003, IJCAI.

[2] Jörg Denzinger,et al. Improving modeling of other agents using tentative stereotypes and compactification of observations , 2004, Proceedings. IEEE/WIC/ACM International Conference on Intelligent Agent Technology, 2004. (IAT 2004)..

[3] David Gale,et al. Review: R. Duncan Luce and Howard Raiffa, Games and decisions: Introduction and critical survey , 1958 .

[4] Mehdi T. Harandi,et al. Synthesis of UNIX Programs Using Derivational Analogy , 2004, Machine Learning.

[5] W. Hamilton,et al. The Evolution of Cooperation , 1984 .

[6] R. Sutton,et al. Macro-Actions in Reinforcement Learning: An Empirical Analysis , 1998 .

[7] Jaime G. Carbonell,et al. Derivational Analogy in PRODIGY: Automating Case Acquisition, Storage, and Utilization , 1993, Machine Learning.

[8] V. S. Subrahmanian,et al. Overconfidence or Paranoia? Search in Imperfect-Information Games , 2006, AAAI.

[9] M. Deutsch. The Resolution of ConflictConstructive and Destructive Processes , 1974 .

[10] Xin Yao,et al. The Iterated Prisoners' Dilemma - 20 Years On , 2007, Advances in Natural Computation.

[11] Jasmine Hamdan,et al. Improving modeling of other agents using tentative stereotypes and compactification of observations , 2004 .

[12] Doina Precup,et al. Theoretical Results on Reinforcement Learning with Temporally Abstract Options , 1998, ECML.

[13] Katia P. Sycara,et al. Using case-based reasoning as a reinforcement learning framework for optimisation with changing criteria , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[14] Richard E. Korf,et al. Macro-Operators: A Weak Method for Learning , 1985, Artif. Intell..

[15] S. Vajda,et al. GAMES AND DECISIONS; INTRODUCTION AND CRITICAL SURVEY. , 1958 .

[16] Chris Drummond,et al. Accelerating Reinforcement Learning by Composing Solutions of Automatically Identified Subtasks , 2011, J. Artif. Intell. Res..

[17] Richard S. Sutton,et al. Roles of Macro-Actions in Accelerating Reinforcement Learning , 1998 .

[18] Jia-Wei Li. How to Design a Strategy to Win an IPD Tournament , 2006 .